Lung Cancer Biomarkers and Uses Thereof

ABSTRACT

The present application includes biomarkers, methods, devices, reagents, systems, and kits for the detection and diagnosis of lung cancer. In one aspect, the application provides biomarkers that can be used alone or in various combinations to diagnose lung cancer or permit the differential diagnosis of pulmonary nodules as benign or malignant. In another aspect, methods are provided for diagnosing lung cancer in an individual, where the methods include detecting, in a biological sample from an individual, at least one biomarker value corresponding to at least one biomarker selected from the group of biomarkers provided in Table 18, Table 20, or Table 21, wherein the individual is classified as having lung cancer, or the likelihood of the individual having lung cancer is determined, based on the at least one biomarker value.

RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Application Ser.No. 61/363,122, filed Jul. 9, 2010 and U.S. Provisional Application Ser.No. 61/444, 947, filed Feb. 21, 2011, each of which is entitled “LungCancer Biomarkers and Uses Thereof”. Each of these applications isincorporated herein by reference in its entirety for all purposes.

FIELD OF THE INVENTION

The present application relates generally to the detection of biomarkersand the diagnosis of cancer in an individual and, more specifically, toone or more biomarkers, methods, devices, reagents, systems, and kitsfor diagnosing cancer, more particularly lung cancer, in an individual.

BACKGROUND

The following description provides a summary of information relevant tothe present application and is not an admission that any of theinformation provided or publications referenced herein is prior art tothe present application.

Lung cancer remains the most common cause of cancer-related mortality.This is true for both men and women. In 2005 in the United States lungcancer accounted for more deaths than breast cancer, prostate cancer,and colon cancer combined. In that year, 107,416 men and 89,271 womenwere diagnosed with lung cancer, and 90,139 men and 69,078 women diedfrom lung cancer. Among men in the United States, lung cancer is thesecond most common cancer among white, black, Asian/Pacific Islander,American Indian/Alaska Native, and Hispanic men. Among women in theUnited States, lung cancer is the second most common cancer among white,black, and American Indian/Alaska Native women, and the third mostcommon cancer among Asian/Pacific Islander and Hispanic women. For thosewho do not quit smoking, the probability of death from lung cancer is15% and remains above 5% even for those who quit at age 50-59. Theannual healthcare cost of lung cancer in the U.S. alone is $95 billion.

Ninety-one percent of lung cancer caused by smoking is non-small celllung cancer (NSCLC), which represents approximately 87% of all lungcancers. The remaining 13% of all lung cancers are small cell lungcancers, although mixed-cell lung cancers do occur. Because small celllung cancer is rare and rapidly fatal, the opportunity for earlydetection is small.

There are three main types of NSCLC: squamous cell carcinoma, large cellcarcinoma, and adenocarcinoma. Adenocarcinoma is the most common form oflung cancer (30%-40% and reported to be as high as 50%) and is the lungcancer most frequently found in both smokers and non-smokers. Squamouscell carcinoma accounts for 25-30% of all lung cancers and is generallyfound in a proximal bronchus. Early stage NSCLC tends to be localized,and if detected early it can often be treated by surgery with afavorable outcome and improved survival. Other treatment options includeradiation treatment, drug therapy, and a combination of these methods.

NSCLC is staged by the size of the tumor and its presence in othertissues including lymph nodes. In the occult stage, cancer cells arefound in sputum samples or lavage samples and no tumor is detectable inthe lungs. In stage 0, only the innermost lining of the lungs exhibitcancer cells and the tumor has not grown through the lining. In stageIA, the cancer is considered invasive and has grown deep into the lungtissue but the tumor is less than 3 cm across. In this stage, the tumoris not found in the bronchus or lymph nodes. In stage IB, the tumor iseither larger than 3 cm across or has grown into the bronchus or pleura,but has not grown into the lymph nodes. In stage IIA, the tumor is morethan 3 cm across and has grown into the lymph nodes. In stage IIB, thetumor has either been found in the lymph nodes and is greater than 3 cmacross or grown into the bronchus or pleura; or the cancer is not in thelymph nodes but is found in the chest wall, diaphragm, pleura, bronchus,or tissue that surrounds the heart. In stage IIIA, cancer cells arefound in the lymph nodes near the lung and bronchi and in those betweenthe lungs but on the side of the chest where the tumor is located. StageIIIB, cancer cells are located on the opposite side of the chest fromthe tumor and in the neck. Other organs near the lungs may also havecancer cells and multiple tumors may be found in one lobe of the lungs.In stage IV, tumors are found in more than one lobe of the same lung orboth lungs and cancer cells are found in other parts of the body.

Current methods of diagnosis for lung cancer include testing sputum forcancerous cells, chest x-ray, fiber optic evaluation of airways, and lowdose spiral computed tomography (CT). Sputum cytology has a very lowsensitivity. Chest X-ray is also relatively insensitive, requiringlesions to be greater than 1 cm in size to be visible. Bronchoscopyrequires that the tumor is visible inside airways accessible to thebronchoscope. The most widely recognized diagnostic method is CT, but incommon with X-ray, the use of CT involves ionizing radiation, whichitself can cause cancer. CT also has significant limitations: the scansrequire a high level of technical skill to interpret and many of theobserved abnormalities are not in fact lung cancer and substantialhealthcare costs are incurred in following up CT findings. The mostcommon incidental finding is a benign lung nodule.

Lung nodules are relatively round lesions, or areas of abnormal tissue,located within the lung and may vary in size. Lung nodules may be benignor cancerous, but most are benign. If a nodule is below 4 mm theprevalence is only 1.5%, if 4-8 mm the prevalence is approximately 6%,and if above 20 mm the incidence is approximately 20%. For small andmedium-sized nodules, the patient is advised to undergo a repeat scanwithin three months to a year. For many large nodules, the patientreceives a biopsy (which is invasive and may lead to complications) eventhough most of these are benign.

Therefore, diagnostic methods that can replace or complement CT areneeded to reduce the number of surgical procedures conducted andminimize the risk of surgical complications. In addition, even when lungnodules are absent or unknown, methods are needed to detect lung cancerat its early stages to improve patient outcomes. Only 16% of lung cancercases are diagnosed as localized, early stage cancer, where the 5-yearsurvival rate is 46%, compared to 84% of those diagnosed at late stage,where the 5-year survival rate is only 13%. This demonstrates thatrelying on symptoms for diagnosis is not useful because many of them arecommon to other lung disease. These symptoms include a persistent cough,bloody sputum, chest pain, and recurring bronchitis or pneumonia.

Where methods of early diagnosis of cancer exist, the benefits aregenerally accepted by the medical community. Cancers that have widelyutilized screening protocols have the highest 5-year survival rates,such as breast cancer (88%) and colon cancer (65%) versus 16% for lungcancer. However, 88% of lung cancer patients survive ten years or longerif the cancer is diagnosed at Stage 1 through screening. Thisdemonstrates the clear need for diagnostic methods that can reliablydetect early-stage NSCLC.

Progression from healthy state to disease is accompanied by changes inprotein expression in affected tissues. Comparative interrogation of thehuman proteome in healthy and diseased tissues can offer insights intothe biology of disease and lead to discovery of biomarkers fordiagnostics, new targets for therapeutic intervention, andidentification of patients most likely to benefit from targetedtreatment. Biomarker selection for a specific disease state involvesfirst the identification of markers that have a measurable andstatistically significant difference in a disease population compared toa control population for a specific medical application. Biomarkers caninclude secreted or shed molecules that parallel disease development orprogression and readily diffuse into the blood stream from lung tissueor from distal tissues in response to a lesion. The biomarker or set ofbiomarkers identified are generally clinically validated or shown to bea reliable indicator for the original intended use for which it wasselected. Biomarkers can include small molecules, peptides, proteins,and nucleic acids. Some of the key issues that affect the identificationof biomarkers include over-fitting of the available data and bias in thedata.

A variety of methods have been utilized in an attempt to identifybiomarkers and diagnose disease. For protein-based markers, theseinclude two-dimensional electrophoresis, mass spectrometry, andimmunoassay methods. For nucleic acid markers, these include mRNAexpression profiles, microRNA profiles, FISH, serial analysis of geneexpression (SAGE), and large scale gene expression arrays.

The utility of two-dimensional electrophoresis is limited by lowdetection sensitivity; issues with protein solubility, charge, andhydrophobicity; gel reproducibility; and the possibility of a singlespot representing multiple proteins. For mass spectrometry, depending onthe format used, limitations revolve around the sample processing andseparation, sensitivity to low abundance proteins, signal to noiseconsiderations, and inability to immediately identify the detectedprotein. Limitations in immunoassay approaches to biomarker discoveryare centered on the inability of antibody-based multiplex assays tomeasure a large number of analytes. One might simply print an array ofhigh-quality antibodies and, without sandwiches, measure the analytesbound to those antibodies. (This would be the formal equivalent of usinga whole genome of nucleic acid sequences to measure by hybridization allDNA or RNA sequences in an organism or a cell. The hybridizationexperiment works because hybridization can be a stringent test foridentity. Even very good antibodies are not stringent enough inselecting their binding partners to work in the context of blood or evencell extracts because the protein ensemble in those matrices haveextremely different abundances.) Thus, one must use a different approachwith immunoassay-based approaches to biomarker discovery—one would needto use multiplexed ELISA assays (that is, sandwiches) to get sufficientstringency to measure many analytes simultaneously to decide whichanalytes are indeed biomarkers. Sandwich immunoassays do not scale tohigh content, and thus biomarker discovery using stringent sandwichimmunoassays is not possible using standard array formats. Lastly,antibody reagents are subject to substantial lot variability and reagentinstability. The instant platform for protein biomarker discoveryovercomes this problem.

Many of these methods rely on or require some type of samplefractionation prior to the analysis. Thus the sample preparationrequired to run a sufficiently powered study designed toidentify/discover statistically relevant biomarkers in a series ofwell-defined sample populations is extremely difficult, costly, and timeconsuming. During fractionation, a wide range of variability can beintroduced into the various samples. For example, a potential markercould be unstable to the process, the concentration of the marker couldbe changed, inappropriate aggregation or disaggregation could occur, andinadvertent sample contamination could occur and thus obscure the subtlechanges anticipated in early disease.

It is widely accepted that biomarker discovery and detection methodsusing these technologies have serious limitations for the identificationof diagnostic biomarkers. These limitations include an inability todetect low-abundance biomarkers, an inability to consistently cover theentire dynamic range of the proteome, irreproducibility in sampleprocessing and fractionation, and overall irreproducibility and lack ofrobustness of the method. Further, these studies have introduced biasesinto the data and not adequately addressed the complexity of the samplepopulations, including appropriate controls, in terms of thedistribution and randomization required to identify and validatebiomarkers within a target disease population.

Although efforts aimed at the discovery of new and effective biomarkershave gone on for several decades, the efforts have been largelyunsuccessful. Biomarkers for various diseases typically have beenidentified in academic laboratories, usually through an accidentaldiscovery while doing basic research on some disease process. Based onthe discovery and with small amounts of clinical data, papers werepublished that suggested the identification of a new biomarker. Most ofthese proposed biomarkers, however, have not been confirmed as real oruseful biomarkers primarily because the small number of clinical samplestested provide only weak statistical proof that an effective biomarkerhas in fact been found. That is, the initial identification was notrigorous with respect to the basic elements of statistics. In each ofthe years 1994 through 2003, a search of the scientific literature showsthat thousands of references directed to biomarkers were published.During that same time frame, however, the FDA approved for diagnosticuse, at most, three new protein biomarkers a year, and in several yearsno new protein biomarkers were approved.

Based on the history of failed biomarker discovery efforts, mathematicaltheories have been proposed that further promote the generalunderstanding that biomarkers for disease are rare and difficult tofind. Biomarker research based on 2D gels or mass spectrometry supportsthese notions. Very few useful biomarkers have been identified throughthese approaches. However, it is usually overlooked that 2D gel and massspectrometry measure proteins that are present in blood at approximately1 nM concentrations and higher, and that this ensemble of proteins maywell be the least likely to change with disease. Other than the instantbiomarker discovery platform, proteomic biomarker discovery platformsthat are able to accurately measure protein expression levels at muchlower concentrations do not exist.

Much is known about biochemical pathways for complex human biology. Manybiochemical pathways culminate in or are started by secreted proteinsthat work locally within the pathology, for example growth factors aresecreted to stimulate the replication of other cells in the pathology,and other factors are secreted to ward off the immune system, and so on.While many of these secreted proteins work in a paracrine fashion, someoperate distally in the body. One skilled in the art with a basicunderstanding of biochemical pathways would understand that manypathology-specific proteins ought to exist in blood at concentrationsbelow (even far below) the detection limits of 2D gels and massspectrometry. What must precede the identification of this relativelyabundant number of disease biomarkers is a proteomic platform that cananalyze proteins at concentrations below those detectable by 2D gels ormass spectrometry.

Accordingly, a need exists for biomarkers, methods, devices, reagents,systems, and kits that enable (a) the differentiation of benignpulmonary nodules from malignant pulmonary nodules; (b) the detection oflung cancer biomarkers; and (c) the diagnosis of lung cancer.

To fulfill this need a novel aptamer-based proteomic technology forbiomarker discovery, which is capable of simultaneously measuringthousands of proteins from small sample volumes of plasma or serum hasbeen developed (see e.g., U.S Pub. No. 2010/0070191; U.S. Pub. No.2010/0086948, Ostroff et al. Nature Precedings,http://precedings.nature.com/documents/4537/version/1 (2010); Gold etal. Nature Precedings,http://precedings.nature.com/documents/4538/version/1 (2010)). Thistechnology, referred to as SOMAscan, is enabled by a new generation ofslow off-rate aptamers (SOMAmers) that contain chemically modifiednucleotides, which greatly expand the physicochemical diversity of thelarge randomized nucleic acid libraries from which the aptamers areselected (see U.S. Pat. No. 7,947,447). Such modifications, which arecompatible with SELEX, introduce functional groups into aptamers thatare often found in protein-protein interaction, antibody-antigeninteractions and interactions between small-molecule drugs with theirprotein targets. Overall, the use of these modifications expands therange of possible aptamer targets, improves their binding properties andfacilitates selection of aptamers with slow dissociation rates.

Specifically, proteins in complex matrices such as plasma are measuredwith a process that transforms a signature of protein concentrationsinto a corresponding signature of DNA aptamer concentrations, which isthen quantified using a DNA microarray platform (Gold et al. NaturePrecedings, http://precedings.nature.com/documents/4538/version/1(2010)). The assay leverages equilibrium binding and kinetic challenge.Both are carried out in solution, not on a surface, to take advantage ofmore favorable kinetics of binding and dissociation. In essence, theassay takes advantage of the dual nature of aptamers as both foldedbinding entities with defined shapes and unique sequences recognizableby specific hybridization probes.

The assay is capable of simultaneously measuring large numbers ofproteins ranging from low to high abundance in serum. For example,samples from 1,326 subjects from four independent studies of non-smallcell lung cancer (NSCLC) have been analyzed in long-term tobacco-exposedpopulations. More than 800 proteins in 15 μL of serum were measured anda 12-protein panel was developed that distinguishes NSCLC from controlswith 91% sensitivity and 84% specificity in a training set and 89%sensitivity and 83% specificity in a blinded, independent verificationset. Importantly, performance was similar for early and late stage NSCLC(Ostroff et al. Nature Precedings,http://precedings.nature.com/documents/4537/version/1 (2010)).

To date, several clinical biomarker studies of human diseases, includinglung cancer (U.S. Pub. No. 2010/0070191), ovarian cancer (U.S. Pub. No.2010/0086948), and chronic kidney disease have been conducted using thismethod. These studies have identified novel potential disease biomarkersto each of these diseases as well as to cancer in general.

SUMMARY

The present application demonstrates the utility of the newly discoveredmicroarray platform technology to identify disease-related biomarkersfrom tissue. The present application includes biomarkers, methods,reagents, devices, systems, and kits for the detection and diagnosis ofcancer and more particularly, lung cancer from tissue. The biomarkers ofthe present application were identified using a multiplex aptamer-basedassay which is described in detail in Example 6. By using theaptamer-based biomarker identification method described herein, thisapplication describes a surprisingly large number of lung cancerbiomarkers from tissue that are useful for the detection and diagnosisof lung cancer. In identifying these biomarkers, over 800 proteins froma number of individual samples were measured, some of which were atconcentrations in the low femtomolar range. This is about four orders ofmagnitude lower than biomarker discovery experiments done with 2D gelsand/or mass spectrometry.

While certain of the described lung cancer biomarkers are useful alonefor detecting and diagnosing lung cancer, methods are described hereinfor the grouping of multiple subsets of the lung cancer biomarkers thatare useful as a panel of biomarkers. Once an individual biomarker orsubset of biomarkers has been identified, the detection or diagnosis oflung cancer in an individual can be accomplished using any assayplatform or format that is capable of measuring differences in thelevels of the selected biomarker or biomarkers in a biological sample.

However, it was only by using the aptamer-based biomarker identificationmethod described herein, wherein over 800 separate potential biomarkervalues were individually screened from a large number of individualshaving previously been diagnosed either as having or not having lungcancer that it was possible to identify the lung cancer biomarkersdisclosed herein. This discovery approach is in stark contrast tobiomarker discovery from conditioned media or lysed cells as it queriesa more patient-relevant system that requires no translation to humanpathology.

Thus, in one aspect of the instant application, one or more biomarkersare provided for use either alone or in various combinations to diagnoselung cancer, particularly non-small cell lung cancer (NSCLC) or permitthe differential diagnosis of pulmonary nodules as benign or malignant.Exemplary embodiments include the biomarkers provided in Table 18, whichas noted above, were identified using a multiplex aptamer-based assay,as described generally in Example 1 and more specifically in Example 6.The markers provided in Table 18 are useful in distinguishing benignnodules from cancerous nodules. The markers provided in Table 18 arealso useful in distinguishing asymptomatic smokers from smokers havinglung cancer. In one aspect the biomarker is MMP-7. In another aspect thebiomarker is MMP-12.

While certain of the described lung cancer biomarkers are useful alonefor detecting and diagnosing lung cancer, methods are also describedherein for the grouping of multiple subsets of the lung cancerbiomarkers that are each useful as a panel of two or more biomarkers.Thus, various embodiments of the instant application providecombinations comprising N biomarkers, wherein N is at least twobiomarkers. In other embodiments, N is selected to be any number from2-36 biomarkers.

In yet other embodiments, N is selected to be any number from 2-7, 2-10,2-15, 2-20, 2-25, 2-30, 2-36. In other embodiments, N is selected to beany number from 3-7, 3-10, 3-15, 3-20, 3-25, 3-30, 3-36. In otherembodiments, N is selected to be any number from 4-7, 4-10, 4-15, 4-20,4-25, 4-30, 4-36. In other embodiments, N is selected to be any numberfrom 5-7, 5-10, 5-15, 5-20, 5-25, 5-30, 5-36. In other embodiments, N isselected to be any number from 6-10, 6-15, 6-20, 6-25, 6-30, 6-36. Inother embodiments, N is selected to be any number from 7-10, 7-15, 7-20,7-25, 7-30, 7-36. In other embodiments, N is selected to be any numberfrom 8-10, 8-15, 8-20, 8-25, 8-30, 8-36. In other embodiments, N isselected to be any number from 9-15, 9-20, 9-25, 9-30, 9-36. In otherembodiments, N is selected to be any number from 10-15, 10-20, 10-25,10-30, 10-36. It will be appreciated that N can be selected to encompasssimilar, but higher order, ranges.

In another aspect, a method is provided for diagnosing lung cancer in anindividual, the method including detecting, in a biological sample froman individual, at least one biomarker value corresponding to at leastone biomarker selected from the group of biomarkers provided in Table18, wherein the individual is classified as having lung cancer based onthe at least one biomarker value.

In another aspect, a method is provided for diagnosing lung cancer in anindividual, the method including detecting, in a biological sample froman individual, biomarker values that each correspond to one of at leastN biomarkers selected from the group of biomarkers set forth in Table18, wherein the likelihood of the individual having lung cancer isdetermined based on the biomarker values.

In another aspect, a method is provided for diagnosing lung cancer in anindividual, the method including detecting, in a biological sample froman individual, biomarker values that each correspond to one of at leastN biomarkers selected from the group of biomarkers set forth in Table18, wherein the individual is classified as having lung cancer based onthe biomarker values, and wherein N=2-10.

In another aspect, a method is provided for diagnosing lung cancer in anindividual, the method including detecting, in a biological sample froman individual, biomarker values that each correspond to one of at leastN biomarkers selected from the group of biomarkers set forth in Table18, wherein the likelihood of the individual having lung cancer isdetermined based on the biomarker values, and wherein N=2-10.

In another aspect, a method is provided for screening smokers for lungcancer, the method including detecting, in a biological sample from anindividual who is a smoker, at least one biomarker value correspondingto at least one biomarker selected from the group of biomarkers setforth in Table 18, wherein the individual is classified as having lungcancer, or the likelihood of the individual having lung cancer isdetermined, based on the at least one biomarker value.

In another aspect, a method is provided for screening smokers for lungcancer, the method including detecting, in a biological sample from anindividual who is a smoker, biomarker values that each correspond to oneof at least N biomarkers selected from the group of biomarkers set forthin Table 18, wherein the individual is classified as having lung cancer,or the likelihood of the individual having lung cancer is determined,based on said biomarker values, wherein N=2-10.

In another aspect, a method is provided for diagnosing that anindividual does not have lung cancer, the method including detecting, ina biological sample from an individual, at least one biomarker valuecorresponding to at least one biomarker selected from the group ofbiomarkers set forth in Table 18, wherein the individual is classifiedas not having lung cancer based on the at least one biomarker value.

In another aspect, a method is provided for diagnosing that anindividual does not have lung cancer, the method including detecting, ina biological sample from an individual, biomarker values that eachcorresponding to one of at least N biomarkers selected from the group ofbiomarkers set forth in Table 18, wherein the individual is classifiedas not having lung cancer based on the biomarker values, and whereinN=2-10.

In another aspect, a method is provided for diagnosing lung cancer, themethod including detecting, in a biological sample from an individual,biomarker values that each correspond to a biomarker on a panel of Nbiomarkers, wherein the biomarkers are selected from the group ofbiomarkers set forth in Table 18, wherein a classification of thebiomarker values indicates that the individual has lung cancer, andwherein N=3-10.

In another aspect, a method is provided for diagnosing lung cancer, themethod including detecting, in a biological sample from an individual,biomarker values that each correspond to a biomarker on a panel of Nbiomarkers, wherein the biomarkers are selected from the group ofbiomarkers set forth in Table 18, wherein a classification of thebiomarker values indicates that the individual has lung cancer, andwherein N=3-15.

In another aspect, a method is provided for screening smokers for lungcancer, the method including detecting, in a biological sample from anindividual who is a smoker, biomarker values that each correspond to abiomarker on a panel of N biomarkers, wherein the biomarkers areselected from the group of biomarkers set forth in Table 18, wherein theindividual is classified as having lung cancer, or the likelihood of theindividual having lung cancer is determined, based on the biomarkervalues, and wherein N=3-10.

In another aspect, a method is provided for screening smokers for lungcancer, the method including detecting, in a biological sample from anindividual who is a smoker, biomarker values that each correspond to abiomarker on a panel of N biomarkers, wherein the biomarkers areselected from the group of biomarkers set forth in Table 18, wherein theindividual is classified as having lung cancer, or the likelihood of theindividual having lung cancer is determined, based on the biomarkervalues, wherein N=3-15.

In another aspect, a method is provided for diagnosing an absence oflung cancer, the method including detecting, in a biological sample froman individual, biomarker values that each correspond to a biomarker on apanel of N biomarkers, wherein the biomarkers are selected from thegroup of biomarkers set forth in Table 18, wherein a classification ofthe biomarker values indicates an absence of lung cancer in theindividual, and wherein N=3-10.

In another aspect, a method is provided for diagnosing an absence oflung cancer, the method including detecting, in a biological sample froman individual, biomarker values that each correspond to a biomarker on apanel of N biomarkers, wherein the biomarkers are selected from thegroup of biomarkers set forth in Table 18, wherein a classification ofthe biomarker values indicates an absence of lung cancer in theindividual, and wherein N=3-15.

In another aspect, a method is provided for diagnosing lung cancer in anindividual, the method including detecting, in a biological sample froman individual, biomarker values that correspond to one of at least Nbiomarkers selected from the group of biomarkers set forth in Table 18,wherein the individual is classified as having lung cancer based on aclassification score that deviates from a predetermined threshold, andwherein N=2-10.

In another aspect, a method is provided for screening smokers for lungcancer, the method including detecting, in a biological sample from anindividual who is a smoker, biomarker values that each correspond to abiomarker on a panel of N biomarkers, wherein the biomarkers areselected from the group of biomarkers set forth in Table 18, wherein theindividual is classified as having lung cancer, or the likelihood of theindividual having lung cancer is determined, based on a classificationscore that deviates from a predetermined threshold, wherein N=3-10.

In another aspect, a method is provided for screening smokers for lungcancer, the method including detecting, in a biological sample from anindividual who is a smoker, biomarker values that each correspond to abiomarker on a panel of N biomarkers, wherein the biomarkers areselected from the group of biomarkers set forth in Table 18, wherein theindividual is classified as having lung cancer, or the likelihood of theindividual having lung cancer is determined, based on a classificationscore that deviates from a predetermined threshold, wherein N=3-15.

In another aspect, a method is provided for diagnosing an absence oflung cancer in an individual, the method including detecting, in abiological sample from an individual, biomarker values that correspondto one of at least N biomarkers selected from the group of biomarkersset forth in Table 18, wherein said individual is classified as nothaving lung cancer based on a classification score that deviates from apredetermined threshold, and wherein N=2-10.

In another aspect, a computer-implemented method is provided forindicating a likelihood of lung cancer. The method comprises: retrievingon a computer biomarker information for an individual, wherein thebiomarker information comprises biomarker values that each correspond toone of at least N biomarkers, wherein N is as defined above, selectedfrom the group of biomarkers set forth in Table 18; performing with thecomputer a classification of each of the biomarker values; andindicating a likelihood that the individual has lung cancer based upon aplurality of classifications.

In another aspect, a computer-implemented method is provided forclassifying an individual as either having or not having lung cancer.The method comprises: retrieving on a computer biomarker information foran individual, wherein the biomarker information comprises biomarkervalues that each correspond to one of at least N biomarkers selectedfrom the group of biomarkers provided in Table 18; performing with thecomputer a classification of each of the biomarker values; andindicating whether the individual has lung cancer based upon a pluralityof classifications.

In another aspect, a computer program product is provided for indicatinga likelihood of lung cancer. The computer program product includes acomputer readable medium embodying program code executable by aprocessor of a computing device or system, the program code comprising:code that retrieves data attributed to a biological sample from anindividual, wherein the data comprises biomarker values that eachcorrespond to one of at least N biomarkers, wherein N is as definedabove, in the biological sample selected from the group of biomarkersset forth in Table 18; and code that executes a classification methodthat indicates a likelihood that the individual has lung cancer as afunction of the biomarker values.

In another aspect, a computer program product is provided for indicatinga lung cancer status of an individual. The computer program productincludes a computer readable medium embodying program code executable bya processor of a computing device or system, the program codecomprising: code that retrieves data attributed to a biological samplefrom an individual, wherein the data comprises biomarker values thateach correspond to one of at least N biomarkers in the biological sampleselected from the group of biomarkers provided in Table 18; and codethat executes a classification method that indicates a lung cancerstatus of the individual as a function of the biomarker values.

In another aspect, a computer-implemented method is provided forindicating a likelihood of lung cancer. The method comprises retrievingon a computer biomarker information for an individual, wherein thebiomarker information comprises a biomarker value corresponding to abiomarker selected from the group of biomarkers set forth in Table 18;performing with the computer a classification of the biomarker value;and indicating a likelihood that the individual has lung cancer basedupon the classification.

In another aspect, a computer-implemented method is provided forclassifying an individual as either having or not having lung cancer.The method comprises retrieving from a computer biomarker informationfor an individual, wherein the biomarker information comprises abiomarker value corresponding to a biomarker selected from the group ofbiomarkers provided in Table 18; performing with the computer aclassification of the biomarker value; and indicating whether theindividual has lung cancer based upon the classification.

In still another aspect, a computer program product is provided forindicating a likelihood of lung cancer. The computer program productincludes a computer readable medium embodying program code executable bya processor of a computing device or system, the program codecomprising: code that retrieves data attributed to a biological samplefrom an individual, wherein the data comprises a biomarker valuecorresponding to a biomarker in the biological sample selected from thegroup of biomarkers set forth in Table 18; and code that executes aclassification method that indicates a likelihood that the individualhas lung cancer as a function of the biomarker value.

In still another aspect, a computer program product is provided forindicating a lung cancer status of an individual. The computer programproduct includes a computer readable medium embodying program codeexecutable by a processor of a computing device or system, the programcode comprising: code that retrieves data attributed to a biologicalsample from an individual, wherein the data comprises a biomarker valuecorresponding to a biomarker in the biological sample selected from thegroup of biomarkers provided in Table 18; and code that executes aclassification method that indicates a lung cancer status of theindividual as a function of the biomarker value.

In another embodiment of the instant application, exemplary embodimentsinclude the biomarkers provided in Table 20, which as noted above, wereidentified using a multiplex aptamer-based assay, as described generallyin Example 1 and more specifically in Example 6. The markers provided inTable 20 are useful in distinguishing benign nodules from cancerousnodules. The markers provided in Table 20 are also useful indistinguishing asymptomatic smokers from smokers having lung cancer.With reference to Table 20, N is selected to be any number from 2-25biomarkers. The markers provided in Table 20 have been determined to beuseful in both tissue and serum samples.

In yet other embodiments, N is selected to be any number from 2-7, 2-10,2-15, 2-20, 2-25. In other embodiments, N is selected to be any numberfrom 3-7, 3-10, 3-15, 3-20, 3-25. In other embodiments, N is selected tobe any number from 4-7, 4-10, 4-15, 4-20, 4-25. In other embodiments, Nis selected to be any number from 5-7, 5-10, 5-15, 5-20, 5-25. In otherembodiments, N is selected to be any number from 6-10, 6-15, 6-20, 6-25.In other embodiments, N is selected to be any number from 7-10, 7-15,7-20, 7-25. In other embodiments, N is selected to be any number from8-10, 8-15, 8-20, 8-25. In other embodiments, N is selected to be anynumber from 9-15, 9-20, 9-25. In other embodiments, N is selected to beany number from 10-15, 10-20, 10-25. It will be appreciated that N canbe selected to encompass similar, but higher order, ranges.

In another aspect, a method is provided for diagnosing lung cancer in anindividual, the method including detecting, in a biological sample froman individual, at least one biomarker value corresponding to at leastone biomarker selected from the group of biomarkers provided in Table20, wherein the individual is classified as having lung cancer based onthe at least one biomarker value.

In another aspect, a method is provided for diagnosing lung cancer in anindividual, the method including detecting, in a biological sample froman individual, biomarker values that each correspond to one of at leastN biomarkers selected from the group of biomarkers set forth in Table20, wherein the likelihood of the individual having lung cancer isdetermined based on the biomarker values.

In another aspect, a method is provided for diagnosing lung cancer in anindividual, the method including detecting, in a biological sample froman individual, biomarker values that each correspond to one of at leastN biomarkers selected from the group of biomarkers set forth in Table20, wherein the individual is classified as having lung cancer based onthe biomarker values, and wherein N=2-10.

In another aspect, a method is provided for diagnosing lung cancer in anindividual, the method including detecting, in a biological sample froman individual, biomarker values that each correspond to one of at leastN biomarkers selected from the group of biomarkers set forth in Table20, wherein the likelihood of the individual having lung cancer isdetermined based on the biomarker values, and wherein N=2-10.

In another aspect, a method is provided for screening smokers for lungcancer, the method including detecting, in a biological sample from anindividual who is a smoker, at least one biomarker value correspondingto at least one biomarker selected from the group of biomarkers setforth in Table 20, wherein the individual is classified as having lungcancer, or the likelihood of the individual having lung cancer isdetermined, based on the at least one biomarker value.

In another aspect, a method is provided for screening smokers for lungcancer, the method including detecting, in a biological sample from anindividual who is a smoker, biomarker values that each correspond to oneof at least N biomarkers selected from the group of biomarkers set forthin Table 20, wherein the individual is classified as having lung cancer,or the likelihood of the individual having lung cancer is determined,based on said biomarker values, wherein N=2-10.

In another aspect, a method is provided for diagnosing that anindividual does not have lung cancer, the method including detecting, ina biological sample from an individual, at least one biomarker valuecorresponding to at least one biomarker selected from the group ofbiomarkers set forth in Table 20, wherein the individual is classifiedas not having lung cancer based on the at least one biomarker value.

In another aspect, a method is provided for diagnosing that anindividual does not have lung cancer, the method including detecting, ina biological sample from an individual, biomarker values that eachcorresponding to one of at least N biomarkers selected from the group ofbiomarkers set forth in Table 20, wherein the individual is classifiedas not having lung cancer based on the biomarker values, and whereinN=2-10.

In another aspect, a method is provided for diagnosing lung cancer, themethod including detecting, in a biological sample from an individual,biomarker values that each correspond to a biomarker on a panel of Nbiomarkers, wherein the biomarkers are selected from the group ofbiomarkers set forth in Table 20, wherein a classification of thebiomarker values indicates that the individual has lung cancer, andwherein N=3-10.

In another aspect, a method is provided for diagnosing lung cancer, themethod including detecting, in a biological sample from an individual,biomarker values that each correspond to a biomarker on a panel of Nbiomarkers, wherein the biomarkers are selected from the group ofbiomarkers set forth in Table 20, wherein a classification of thebiomarker values indicates that the individual has lung cancer, andwherein N=3-15.

In another aspect, a method is provided for screening smokers for lungcancer, the method including detecting, in a biological sample from anindividual who is a smoker, biomarker values that each correspond to abiomarker on a panel of N biomarkers, wherein the biomarkers areselected from the group of biomarkers set forth in Table 20, wherein theindividual is classified as having lung cancer, or the likelihood of theindividual having lung cancer is determined, based on the biomarkervalues, and wherein N=3-10.

In another aspect, a method is provided for screening smokers for lungcancer, the method including detecting, in a biological sample from anindividual who is a smoker, biomarker values that each correspond to abiomarker on a panel of N biomarkers, wherein the biomarkers areselected from the group of biomarkers set forth in Table 20, wherein theindividual is classified as having lung cancer, or the likelihood of theindividual having lung cancer is determined, based on the biomarkervalues, wherein N=3-15.

In another aspect, a method is provided for diagnosing an absence oflung cancer, the method including detecting, in a biological sample froman individual, biomarker values that each correspond to a biomarker on apanel of N biomarkers, wherein the biomarkers are selected from thegroup of biomarkers set forth in Table 20, wherein a classification ofthe biomarker values indicates an absence of lung cancer in theindividual, and wherein N=3-10.

In another aspect, a method is provided for diagnosing an absence oflung cancer, the method including detecting, in a biological sample froman individual, biomarker values that each correspond to a biomarker on apanel of N biomarkers, wherein the biomarkers are selected from thegroup of biomarkers set forth in Table 20, wherein a classification ofthe biomarker values indicates an absence of lung cancer in theindividual, and wherein N=3-15.

In another aspect, a method is provided for diagnosing lung cancer in anindividual, the method including detecting, in a biological sample froman individual, biomarker values that correspond to one of at least Nbiomarkers selected from the group of biomarkers set forth in Table 20,wherein the individual is classified as having lung cancer based on aclassification score that deviates from a predetermined threshold, andwherein N=2-10.

In another aspect, a method is provided for screening smokers for lungcancer, the method including detecting, in a biological sample from anindividual who is a smoker, biomarker values that each correspond to abiomarker on a panel of N biomarkers, wherein the biomarkers areselected from the group of biomarkers set forth in Table 20, wherein theindividual is classified as having lung cancer, or the likelihood of theindividual having lung cancer is determined, based on a classificationscore that deviates from a predetermined threshold, wherein N=3-10.

In another aspect, a method is provided for screening smokers for lungcancer, the method including detecting, in a biological sample from anindividual who is a smoker, biomarker values that each correspond to abiomarker on a panel of N biomarkers, wherein the biomarkers areselected from the group of biomarkers set forth in Table 20, wherein theindividual is classified as having lung cancer, or the likelihood of theindividual having lung cancer is determined, based on a classificationscore that deviates from a predetermined threshold, wherein N=3-15.

In another aspect, a method is provided for diagnosing an absence oflung cancer in an individual, the method including detecting, in abiological sample from an individual, biomarker values that correspondto one of at least N biomarkers selected from the group of biomarkersset forth in Table 20, wherein said individual is classified as nothaving lung cancer based on a classification score that deviates from apredetermined threshold, and wherein N=2-10.

In another aspect, a computer-implemented method is provided forindicating a likelihood of lung cancer. The method comprises: retrievingon a computer biomarker information for an individual, wherein thebiomarker information comprises biomarker values that each correspond toone of at least N biomarkers, wherein N is as defined above, selectedfrom the group of biomarkers set forth in Table 20; performing with thecomputer a classification of each of the biomarker values; andindicating a likelihood that the individual has lung cancer based upon aplurality of classifications.

In another aspect, a computer-implemented method is provided forclassifying an individual as either having or not having lung cancer.The method comprises: retrieving on a computer biomarker information foran individual, wherein the biomarker information comprises biomarkervalues that each correspond to one of at least N biomarkers selectedfrom the group of biomarkers provided in Table 20; performing with thecomputer a classification of each of the biomarker values; andindicating whether the individual has lung cancer based upon a pluralityof classifications.

In another aspect, a computer program product is provided for indicatinga likelihood of lung cancer. The computer program product includes acomputer readable medium embodying program code executable by aprocessor of a computing device or system, the program code comprising:code that retrieves data attributed to a biological sample from anindividual, wherein the data comprises biomarker values that eachcorrespond to one of at least N biomarkers, wherein N is as definedabove, in the biological sample selected from the group of biomarkersset forth in Table 20; and code that executes a classification methodthat indicates a likelihood that the individual has lung cancer as afunction of the biomarker values.

In another aspect, a computer program product is provided for indicatinga lung cancer status of an individual. The computer program productincludes a computer readable medium embodying program code executable bya processor of a computing device or system, the program codecomprising: code that retrieves data attributed to a biological samplefrom an individual, wherein the data comprises biomarker values thateach correspond to one of at least N biomarkers in the biological sampleselected from the group of biomarkers provided in Table 20; and codethat executes a classification method that indicates a lung cancerstatus of the individual as a function of the biomarker values.

In another aspect, a computer-implemented method is provided forindicating a likelihood of lung cancer. The method comprises retrievingon a computer biomarker information for an individual, wherein thebiomarker information comprises a biomarker value corresponding to abiomarker selected from the group of biomarkers set forth in Table 20;performing with the computer a classification of the biomarker value;and indicating a likelihood that the individual has lung cancer basedupon the classification.

In another aspect, a computer-implemented method is provided forclassifying an individual as either having or not having lung cancer.The method comprises retrieving from a computer biomarker informationfor an individual, wherein the biomarker information comprises abiomarker value corresponding to a biomarker selected from the group ofbiomarkers provided in Table 20; performing with the computer aclassification of the biomarker value; and indicating whether theindividual has lung cancer based upon the classification.

In still another aspect, a computer program product is provided forindicating a likelihood of lung cancer. The computer program productincludes a computer readable medium embodying program code executable bya processor of a computing device or system, the program codecomprising: code that retrieves data attributed to a biological samplefrom an individual, wherein the data comprises a biomarker valuecorresponding to a biomarker in the biological sample selected from thegroup of biomarkers set forth in Table 20; and code that executes aclassification method that indicates a likelihood that the individualhas lung cancer as a function of the biomarker value.

In still another aspect, a computer program product is provided forindicating a lung cancer status of an individual. The computer programproduct includes a computer readable medium embodying program codeexecutable by a processor of a computing device or system, the programcode comprising: code that retrieves data attributed to a biologicalsample from an individual, wherein the data comprises a biomarker valuecorresponding to a biomarker in the biological sample selected from thegroup of biomarkers provided in Table 20; and code that executes aclassification method that indicates a lung cancer status of theindividual as a function of the biomarker value.

In another embodiment of the instant application, exemplary embodimentsinclude the biomarkers provided in Table 21, which were identified usinga multiplex aptamer-based assay, as described generally in Example 1 andmore specifically in Examples 2 and 6. The markers provided in Table 21are useful in distinguishing benign nodules from cancerous nodules. Themarkers provided in Table 21 are also useful in distinguishingasymptomatic smokers from smokers having lung cancer. With reference toTable 21, N is selected to be any number from 2-86 biomarkers. All ofthe biomarkers included in Table 21 are useful in providing theinformation being sought in both tissue and serum samples.

In yet other embodiments, N is selected to be any number from 2-7, 2-10,2-15, 2-20, 2-25, 2-30, 2-35, 2-40, 2-45, 2-50, 2-55, 2-60, 2-65, 2-70,2-75, 2-80, or 2-86. In other embodiments, N is selected to be anynumber from 3-7, 3-10, 3-15, 3-20, 3-25, 3-30, 3-35, 3-40, 3-45, 3-50,3-55, 3-60, 3-65, 3-70, 3-75, 3-80, or 3-86. In other embodiments, N isselected to be any number from 4-7, 4-10, 4-15, 4-20, 4-25, 4-30, 4-35,4-40, 4-45, 4-50, 4-55, 4-60, 4-65, 4-70, 4-75, 4-80, or 4-86. In otherembodiments, N is selected to be any number from 5-7, 5-10, 5-15, 5-20,5-25, 5-30, 5-35, 5-40, 5-45, 5-50, 5-55, 5-60, 5-65, 5-70, 5-75, 5-80,or 5-86. In other embodiments, N is selected to be any number from 6-10,6-15, 6-20, 6-25, 6-30, 6-35, 6-40, 6-45, 6-50, 6-55, 6-60, 6-65, 6-70,6-75, 6-80, or 6-86. In other embodiments, N is selected to be anynumber from 7-10, 7-15, 7-20, 7-25, 7-30, 7-35, 7-40, 7-45, 7-50, 7-55,7-60, 7-65, 7-70, 7-75, 7-80, or 7-86. In other embodiments, N isselected to be any number from 8-10, 8-15, 8-20, 8-25, 8-30, 8-35, 8-40,8-45, 8-50, 8-55, 8-60, 8-65, 8-70, 8-75, 8-80, or 8-86. In otherembodiments, N is selected to be any number from 9-15, 9-20, 9-25, 9-30,9-35, 9-40, 9-45, 9-50, 9-55, 9-60, 9-65, 9-70, 9-75, 9-80, or 9-86. Inother embodiments, N is selected to be any number from 10-15, 10-20,10-25, 10-30, 10-35, 10-40, 10-45, 10-50, 10-55, 10-60, 10-65, 10-70,10-75, 10-80, or 10-86. It will be appreciated that N can be selected toencompass similar, but higher order, ranges.

In another aspect, a method is provided for diagnosing lung cancer in anindividual, the method including detecting, in a biological sample froman individual, at least one biomarker value corresponding to at leastone biomarker selected from the group of biomarkers provided in Table21, wherein the individual is classified as having lung cancer based onthe at least one biomarker value.

In another aspect, a method is provided for diagnosing lung cancer in anindividual, the method including detecting, in a biological sample froman individual, biomarker values that each correspond to one of at leastN biomarkers selected from the group of biomarkers set forth in Table21, wherein the likelihood of the individual having lung cancer isdetermined based on the biomarker values.

In another aspect, a method is provided for diagnosing lung cancer in anindividual, the method including detecting, in a biological sample froman individual, biomarker values that each correspond to one of at leastN biomarkers selected from the group of biomarkers set forth in Table21, wherein the individual is classified as having lung cancer based onthe biomarker values, and wherein N=2-10.

In another aspect, a method is provided for diagnosing lung cancer in anindividual, the method including detecting, in a biological sample froman individual, biomarker values that each correspond to one of at leastN biomarkers selected from the group of biomarkers set forth in Table21, wherein the likelihood of the individual having lung cancer isdetermined based on the biomarker values, and wherein N=2-10.

In another aspect, a method is provided for screening smokers for lungcancer, the method including detecting, in a biological sample from anindividual who is a smoker, at least one biomarker value correspondingto at least one biomarker selected from the group of biomarkers setforth in Table 21, wherein the individual is classified as having lungcancer, or the likelihood of the individual having lung cancer isdetermined, based on the at least one biomarker value.

In another aspect, a method is provided for screening smokers for lungcancer, the method including detecting, in a biological sample from anindividual who is a smoker, biomarker values that each correspond to oneof at least N biomarkers selected from the group of biomarkers set forthin Table 21, wherein the individual is classified as having lung cancer,or the likelihood of the individual having lung cancer is determined,based on said biomarker values, wherein N=2-10.

In another aspect, a method is provided for diagnosing that anindividual does not have lung cancer, the method including detecting, ina biological sample from an individual, at least one biomarker valuecorresponding to at least one biomarker selected from the group ofbiomarkers set forth in Table 21, wherein the individual is classifiedas not having lung cancer based on the at least one biomarker value.

In another aspect, a method is provided for diagnosing that anindividual does not have lung cancer, the method including detecting, ina biological sample from an individual, biomarker values that eachcorresponding to one of at least N biomarkers selected from the group ofbiomarkers set forth in Table 21, wherein the individual is classifiedas not having lung cancer based on the biomarker values, and whereinN=2-10.

In another aspect, a method is provided for diagnosing lung cancer, themethod including detecting, in a biological sample from an individual,biomarker values that each correspond to a biomarker on a panel of Nbiomarkers, wherein the biomarkers are selected from the group ofbiomarkers set forth in Table 21, wherein a classification of thebiomarker values indicates that the individual has lung cancer, andwherein N=3-10.

In another aspect, a method is provided for diagnosing lung cancer, themethod including detecting, in a biological sample from an individual,biomarker values that each correspond to a biomarker on a panel of Nbiomarkers, wherein the biomarkers are selected from the group ofbiomarkers set forth in Table 21, wherein a classification of thebiomarker values indicates that the individual has lung cancer, andwherein N=3-15.

In another aspect, a method is provided for screening smokers for lungcancer, the method including detecting, in a biological sample from anindividual who is a smoker, biomarker values that each correspond to abiomarker on a panel of N biomarkers, wherein the biomarkers areselected from the group of biomarkers set forth in Table 21, wherein theindividual is classified as having lung cancer, or the likelihood of theindividual having lung cancer is determined, based on the biomarkervalues, and wherein N=3-10.

In another aspect, a method is provided for screening smokers for lungcancer, the method including detecting, in a biological sample from anindividual who is a smoker, biomarker values that each correspond to abiomarker on a panel of N biomarkers, wherein the biomarkers areselected from the group of biomarkers set forth in Table 21, wherein theindividual is classified as having lung cancer, or the likelihood of theindividual having lung cancer is determined, based on the biomarkervalues, wherein N=3-15.

In another aspect, a method is provided for diagnosing an absence oflung cancer, the method including detecting, in a biological sample froman individual, biomarker values that each correspond to a biomarker on apanel of N biomarkers, wherein the biomarkers are selected from thegroup of biomarkers set forth in Table 21, wherein a classification ofthe biomarker values indicates an absence of lung cancer in theindividual, and wherein N=3-10.

In another aspect, a method is provided for diagnosing an absence oflung cancer, the method including detecting, in a biological sample froman individual, biomarker values that each correspond to a biomarker on apanel of N biomarkers, wherein the biomarkers are selected from thegroup of biomarkers set forth in Table 21, wherein a classification ofthe biomarker values indicates an absence of lung cancer in theindividual, and wherein N=3-15.

In another aspect, a method is provided for diagnosing lung cancer in anindividual, the method including detecting, in a biological sample froman individual, biomarker values that correspond to one of at least Nbiomarkers selected from the group of biomarkers set forth in Table 21,wherein the individual is classified as having lung cancer based on aclassification score that deviates from a predetermined threshold, andwherein N=2-10.

In another aspect, a method is provided for screening smokers for lungcancer, the method including detecting, in a biological sample from anindividual who is a smoker, biomarker values that each correspond to abiomarker on a panel of N biomarkers, wherein the biomarkers areselected from the group of biomarkers set forth in Table 21, wherein theindividual is classified as having lung cancer, or the likelihood of theindividual having lung cancer is determined, based on a classificationscore that deviates from a predetermined threshold, wherein N=3-10.

In another aspect, a method is provided for screening smokers for lungcancer, the method including detecting, in a biological sample from anindividual who is a smoker, biomarker values that each correspond to abiomarker on a panel of N biomarkers, wherein the biomarkers areselected from the group of biomarkers set forth in Table 21, wherein theindividual is classified as having lung cancer, or the likelihood of theindividual having lung cancer is determined, based on a classificationscore that deviates from a predetermined threshold, wherein N=3-15.

In another aspect, a method is provided for diagnosing an absence oflung cancer in an individual, the method including detecting, in abiological sample from an individual, biomarker values that correspondto one of at least N biomarkers selected from the group of biomarkersset forth in Table 21, wherein said individual is classified as nothaving lung cancer based on a classification score that deviates from apredetermined threshold, and wherein N=2-10.

In another aspect, a computer-implemented method is provided forindicating a likelihood of lung cancer. The method comprises: retrievingon a computer biomarker information for an individual, wherein thebiomarker information comprises biomarker values that each correspond toone of at least N biomarkers, wherein N is as defined above, selectedfrom the group of biomarkers set forth in Table 21; performing with thecomputer a classification of each of the biomarker values; andindicating a likelihood that the individual has lung cancer based upon aplurality of classifications.

In another aspect, a computer-implemented method is provided forclassifying an individual as either having or not having lung cancer.The method comprises: retrieving on a computer biomarker information foran individual, wherein the biomarker information comprises biomarkervalues that each correspond to one of at least N biomarkers selectedfrom the group of biomarkers provided in Table 21; performing with thecomputer a classification of each of the biomarker values; andindicating whether the individual has lung cancer based upon a pluralityof classifications.

In another aspect, a computer program product is provided for indicatinga likelihood of lung cancer. The computer program product includes acomputer readable medium embodying program code executable by aprocessor of a computing device or system, the program code comprising:code that retrieves data attributed to a biological sample from anindividual, wherein the data comprises biomarker values that eachcorrespond to one of at least N biomarkers, wherein N is as definedabove, in the biological sample selected from the group of biomarkersset forth in Table 21; and code that executes a classification methodthat indicates a likelihood that the individual has lung cancer as afunction of the biomarker values.

In another aspect, a computer program product is provided for indicatinga lung cancer status of an individual. The computer program productincludes a computer readable medium embodying program code executable bya processor of a computing device or system, the program codecomprising: code that retrieves data attributed to a biological samplefrom an individual, wherein the data comprises biomarker values thateach correspond to one of at least N biomarkers in the biological sampleselected from the group of biomarkers provided in Table 21; and codethat executes a classification method that indicates a lung cancerstatus of the individual as a function of the biomarker values.

In another aspect, a computer-implemented method is provided forindicating a likelihood of lung cancer. The method comprises retrievingon a computer biomarker information for an individual, wherein thebiomarker information comprises a biomarker value corresponding to abiomarker selected from the group of biomarkers set forth in Table 21;performing with the computer a classification of the biomarker value;and indicating a likelihood that the individual has lung cancer basedupon the classification.

In another aspect, a computer-implemented method is provided forclassifying an individual as either having or not having lung cancer.The method comprises retrieving from a computer biomarker informationfor an individual, wherein the biomarker information comprises abiomarker value corresponding to a biomarker selected from the group ofbiomarkers provided in Table 21; performing with the computer aclassification of the biomarker value; and indicating whether theindividual has lung cancer based upon the classification.

In still another aspect, a computer program product is provided forindicating a likelihood of lung cancer. The computer program productincludes a computer readable medium embodying program code executable bya processor of a computing device or system, the program codecomprising: code that retrieves data attributed to a biological samplefrom an individual, wherein the data comprises a biomarker valuecorresponding to a biomarker in the biological sample selected from thegroup of biomarkers set forth in Table 21; and code that executes aclassification method that indicates a likelihood that the individualhas lung cancer as a function of the biomarker value.

In still another aspect, a computer program product is provided forindicating a lung cancer status of an individual. The computer programproduct includes a computer readable medium embodying program codeexecutable by a processor of a computing device or system, the programcode comprising: code that retrieves data attributed to a biologicalsample from an individual, wherein the data comprises a biomarker valuecorresponding to a biomarker in the biological sample selected from thegroup of biomarkers provided in Table 21; and code that executes aclassification method that indicates a lung cancer status of theindividual as a function of the biomarker value.

In one aspect of the application at least one of said N biomarkersselected from Table 21 in each of the above methods is a biomarkerselected from the Table 20. In yet another embodiment said biomarkerselected from Table 20 is MMP-12.

In another aspect, a method is provided for diagnosing lung cancer, themethod including detecting, in a biological sample from an individual,biomarker values that each correspond to a biomarker on a panel ofbiomarkers selected from the group of panels set forth in Tables 22-25wherein a classification of the biomarker values indicates that theindividual has lung cancer.

In another aspect, a method is provided for diagnosing an absence oflung cancer, the method including detecting, in a biological sample froman individual, biomarker values that each correspond to a biomarker on apanel of biomarkers selected from the group of panels provided in Tables22-25, wherein a classification of the biomarker values indicates anabsence of lung cancer in the individual.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1A is a flowchart for an exemplary method for detecting lung cancerin a biological sample.

FIG. 1B is a flowchart for an exemplary method for detecting lung cancerin a biological sample using a naïve Bayes classification method.

FIG. 2 shows a ROC curve for a single biomarker, SCFsR, using a naïveBayes classifier for a test that detects lung cancer in asymptomaticsmokers.

FIG. 3 shows ROC curves for biomarker panels of from one to tenbiomarkers using naïve Bayes classifiers for a test that detects lungcancer in asymptomatic smokers.

FIG. 4 illustrates the increase in the classification score(specificity+sensitivity) as the number of biomarkers is increased fromone to ten using naïve Bayes classification for a benign nodule-lungcancer panel.

FIG. 5 shows the measured biomarker distributions for SCFsR as acumulative distribution function (cdf) in log-transformed RFU for thebenign nodule control group (solid line) and the lung cancer diseasegroup (dotted line) along with their curve fits to a normal cdf (dashedlines) used to train the naïve Bayes classifiers

FIG. 6 illustrates an exemplary computer system for use with variouscomputer-implemented methods described herein.

FIG. 7 is a flowchart for a method of indicating the likelihood that anindividual has lung cancer in accordance with one embodiment.

FIG. 8 is a flowchart for a method of indicating the likelihood that anindividual has lung cancer in accordance with one embodiment.

FIG. 9 illustrates an exemplary aptamer assay that can be used to detectone or more lung cancer biomarkers in a biological sample.

FIG. 10 shows a histogram of frequencies for which biomarkers were usedin building classifiers to distinguish between NSCLC and benign nodulesfrom an aggregated set of potential biomarkers.

FIG. 11 shows a histogram of frequencies for which biomarkers were usedin building classifiers to distinguish between NSCLC and asymptomaticsmokers from an aggregated set of potential biomarkers.

FIG. 12 shows a histogram of frequencies for which biomarkers were usedin building classifiers to distinguish between NSCLC and benign nodulesfrom a site-consistent set of potential biomarkers.

FIG. 13 shows a histogram of frequencies for which biomarkers were usedin building classifiers to distinguish between NSCLC and asymptomaticsmokers from a site-consistent set of potential biomarkers.

FIG. 14 shows a histogram of frequencies for which biomarkers were usedin building classifiers to distinguish between NSCLC and benign nodulesfrom a set of potential biomarkers resulting from a combination ofaggregated and site-consistent markers.

FIG. 15 shows a histogram of frequencies for which biomarkers were usedin building classifiers to distinguish between NSCLC and asymptomaticsmokers from a set of potential biomarkers resulting from a combinationof aggregated and site-consistent markers.

FIG. 16 shows gel images resulting from pull-down experiments thatillustrate the specificity of aptamers as capture reagents for theproteins LBP, C9 and IgM. For each gel, lane 1 is the eluate from theStreptavidin-agarose beads, lane 2 is the final eluate, and lane is a MWmarker lane (major bands are at 110, 50, 30, 15, and 3.5 kDa from top tobottom).

FIG. 17A shows a pair of histograms summarizing all possible singleprotein naïve Bayes classifier scores (sensitivity+specificity) usingthe biomarkers set forth in Table 1, Col 5 (solid) and a set of randommarkers (dotted).

FIG. 17B shows a pair of histograms summarizing all possible two-proteinprotein naïve Bayes classifier scores (sensitivity+specificity) usingthe biomarkers set forth in Table 1, Col 5 (solid) and a set of randommarkers (dotted).

FIG. 17C shows a pair of histograms summarizing all possiblethree-protein naïve Bayes classifier scores (sensitivity+specificity)using the biomarkers set forth in Table 1, Col 5 (solid) and a set ofrandom markers (dotted).

FIG. 18A shows a pair of histograms summarizing all possible singleprotein naïve Bayes classifier scores (sensitivity+specificity) usingthe biomarkers set forth in Table 1, Col 6 (solid) and a set of randommarkers (dotted).

FIG. 18B shows a pair of histograms summarizing all possible two-proteinprotein naïve Bayes classifier scores (sensitivity+specificity) usingthe biomarkers set forth in Table 1, Col 6 (solid) and a set of randommarkers (dotted).

FIG. 18C shows a pair of histograms summarizing all possiblethree-protein naïve Bayes classifier scores (sensitivity+specificity)using the biomarkers set forth in Table 1, Col 6 (solid) and a set ofrandom markers (dotted).

FIG. 19A shows the sensitivity+specificity score for naïve Bayesclassifiers using from 2-10 markers selected from the full panel (♦) andthe scores obtained by dropping the best 5 (▪), 10 (▴) and 15 (x)markers during classifier generation for the benign nodule controlgroup.

FIG. 19B shows the sensitivity+specificity score for naïve Bayesclassifiers using from 2-10 markers selected from the full panel (♦) andthe scores obtained by dropping the best 5 (▪), 10 (▴) and 15 (x)markers during classifier generation for the smoker control group.

FIG. 20A shows a set of ROC curves modeled from the data in Tables 38and 39 for panels of from one to five markers.

FIG. 20B shows a set of ROC curves computed from the training data forpanels of from one to five markers as in FIG. 19A.

FIG. 21 shows relative changes in protein expression for 813 proteinsfrom eight NSCLC resection samples between adjacent and distant tissue(FIG. 21A), tumor and adjacent tissue (FIG. 21B) and tumor and distanttissue (FIG. 21C) expressed as log 2 median ratios. The dotted linerepresents a two-fold change (log 2=1).

FIG. 22 show a heat map of protein levels in tumor tissue samples. Thesamples are arranged in columns and are separated into distant,adjacent, and tumor samples. Within each tissue type, the samples areseparated into adenocarcinoma (AC) and squamous cell carcinoma (SCC).The numbers above each column correspond to patient codes. The proteinsare displayed in rows and were ordered using hierarchial clustering.

FIG. 23 (A-T) depicts proteins with increased levels in tumor tissuecompared with adjacent or distal tissue.

FIG. 24 (A-P) depicts proteins with decreased levels in tumor tissuecompared with adjacent or distal tissue from the eight NSCLC samplesused in this study.

FIG. 25 shows SOMAmer histochemistry on frozen tissue sections forselected biomarkers detected in this study. (A) Thrombospondin-2 (red)staining the fibrocollagenous matrix surrounding a tumor nest. (B)Corresponding normal lung specimen stained with Thrombospondin-2 SOMAmer(red). (C) Macrophage Mannose Receptor SOMAmer (red) staining scatteredmacrophages in a lung adenocarcinoma. (D) Macrophage Mannose ReceptorSOMAmer (red) staining numerous alveolar macrophages in a section ofnormal lung parenchyma. (E) Multicolor image highlighting thecytomorphologic distribution of Macrophage Mannose Receptor SOMAmerstaining: Green=Cytokeratin (AE1/AE3 antibody), Red=CD31 (EP3095Antibody), and Orange=SOMAmer. All nuclei in this figure arecounterstained with DAPI.

FIG. 26 shows changes in protein expression in NSCLC tissue compared toserum. The top two panels show the log 2 ratio (LR) derived from serumsamples versus log ratios derived from adjacent tissue and distanttissue, respectively. The bottom four panels feature zoomed portions ofplots above, indicated by the color of the plot (green for decreased andred for increased expression compared to non-tumor tissue). Analytesshown in FIGS. 23 and 24 have been labeled and analytes mentioned in thepublication on the serum samples are shown in filled red symbols red.

FIG. 27 depicts thrombospondin-2 histochemical identification in tissuesamples. Thrombospondin-2 is identified in a serial frozen section of asingle lung carcinoma specimen by (A) a home-made rabbit polyclonalthrombospondin-2 polyclonal antibody, (B) the pre-immune serum fromrabbits used to make the home-made polyclonal antibody, (C) a commercial(Novus) rabbit polyclonal thrombospondin-2 antibody, and (D) thethrombospondin-2 SOMAmer. The thrombospondin-2 SOMAmer was then used tostain frozen sections of normal and malignant lung tissue, with standardAvidin-Biotin-Peroxidase color development, to demonstrate differentmorphologic distributions: (E) Strong staining of the fibrotic stromasurrounding tumor nests, with minimal cytosolic staining of carcinomacells, (F) Strong staining of the fibrotic stroma surrounding tumornests in a mucinous adenocarcinoma, with no significant staining of thecarcinoma cells, (G) normal lung tissue, showing strong cytosolicstaining of bronchial epithelium and scattered alveolar macrophages, and(H) strong cytosolic staining of an adenocarcinoma, with no significantstaining of the non-fibrotic, predominantly inflammatory stroma.

DETAILED DESCRIPTION

The practice of the invention disclosed herein employs, unless otherwiseindicated, conventional methods of chemistry, microbiology, molecularbiology, and recombinant DNA techniques within the level of skill in theart. Such techniques are explained fully in the literature. See, e.g.,Sambrook, et al. Molecular Cloning: A Laboratory Manual (CurrentEdition); DNA Cloning: A Practical Approach, vol. I & II (D. Glover,ed.); Oligonucleotide Synthesis (N. Gait, ed., Current Edition); NucleicAcid Hybridization (B. Hames & S. Higgins, eds., Current Edition);Transcription and Translation (B. Hames & S. Higgins, eds., CurrentEdition; Histology for Pathologists (S. E. Mills, Current Edition). Allpublications, published patent documents, and patent applications citedin this specification are indicative of the level of skill in the art(s)to which the invention pertains. All publications, published patentdocuments, and patent applications cited herein are hereby incorporatedby reference to the same extent as though each individual publication,published patent document, or patent application was specifically andindividually indicated as being incorporated by reference.

Reference will now be made in detail to representative embodiments ofthe invention. While the invention will be described in conjunction withthe enumerated embodiments, it will be understood that the invention isnot intended to be limited to those embodiments. On the contrary, theinvention is intended to cover all alternatives, modifications, andequivalents that may be included within the scope of the presentinvention as defined by the claims.

Unless defined otherwise, technical and scientific terms used hereinhave the same meaning as commonly understood by one of ordinary skill inthe art to which this invention belongs. Although any methods, devices,and materials similar or equivalent to those described herein can beused in the practice or testing of the invention, the preferred methods,devices and materials are now described.

As used in this application, including the appended claims, the singularforms “a,” “an,” and “the” include plural references, unless the contentclearly dictates otherwise, and are used interchangeably with “at leastone” and “one or more.” Thus, reference to “an aptamer” includesmixtures of aptamers, reference to “a probe” includes mixtures ofprobes, and the like.

As used herein, the term “about” represents an insignificantmodification or variation of the numerical value such that the basicfunction of the item to which the numerical value relates is unchanged.

As used herein, the terms “comprises,” “comprising,” “includes,”“including,” “contains,” “containing,” and any variations thereof, areintended to cover a non-exclusive inclusion, such that a process,method, product-by-process, or composition of matter that comprises,includes, or contains an element or list of elements does not includeonly those elements but may include other elements not expressly listedor inherent to such process, method, product-by-process, or compositionof matter.

The present application includes biomarkers, methods, devices, reagents,systems, and kits for the detection and diagnosis of lung cancer.

In one aspect, one or more biomarkers are provided for use either aloneor in various combinations to diagnose lung cancer, permit thedifferential diagnosis of pulmonary nodules as benign or malignant,monitor lung cancer recurrence, or address other clinical indications.In other aspects said biomarker(s) can be used in determininginformation about lung cancer in an individual such as, prognosis,cancer classification, prediction of disease risk or selection oftreatment. As described in detail below, exemplary embodiments includethe biomarkers provided in Tables 18, and 21, which were identifiedusing a multiplex aptamer-based assay that is described generally inExample 1 and more specifically in Examples 2 and 6. Each of thebiomarkers is useful in assaying any type of sample as defined below.

Table 1, Col. 2 sets forth the findings obtained from analyzing hundredsof individual blood samples from NSCLC cancer cases, and hundreds ofequivalent individual blood samples from smokers and from individualsdiagnosed with benign lung nodules. The smoker and benign nodule groupswere designed to match the populations with which a lung cancerdiagnostic test can have the most benefit. (These cases and controlswere obtained from multiple clinical sites to mimic the range of realworld conditions under which such a test can be applied). The potentialbiomarkers were measured in individual samples rather than pooling thedisease and control blood; this allowed a better understanding of theindividual and group variations in the phenotypes associated with thepresence and absence of disease (in this case lung cancer). Since over800 protein measurements were made on each sample, and several hundredsamples from each of the disease and the control populations wereindividually measured, Table 1, Col. 2 resulted from an analysis of anuncommonly large set of data. The measurements were analyzed using themethods described in the section, “Classification of Biomarkers andCalculation of Disease Scores” herein.

Table 1, Col. 2 lists the biomarkers found to be useful indistinguishing samples obtained from individuals with NSCLC from“control” samples obtained from smokers and individuals with benign lungnodules. Using a multiplex aptamer assay as described herein,thirty-eight biomarkers were discovered that distinguished the samplesobtained from individuals who had lung cancer from the samples obtainedfrom individuals in the smoker control group (see Table 1, Col. 6).Similarly, using a multiplex aptamer assay, forty biomarkers werediscovered that distinguished samples obtained from individuals withNSCLC from samples obtained from people who had benign lung nodules (seeTable 1, Col. 5). Together, the two lists of 38 and 40 biomarkers arecomprised of 61 unique biomarkers, because there is considerable overlapbetween the list of biomarkers for distinguishing NSCLC from benignnodules and the list for distinguishing NSCLC from smokers who do nothave lung cancer.

Table 18 sets forth the findings obtained from analyzing eightindividual tissue samples of smokers diagnosed with NSCLC as describedin Example 6. All of the patients were smokers ranging from 47 to 75years old and covering NSCLC stages 1A through 3B. Three samples wereobtained from each individual: tumor tissue, adjacent healthy tissue(within 1 cm of the tumor) and distant uninvolved lung tissue. Thesamples were chosen to match the populations with which a lung cancerdiagnostic test can have the most benefit. The potential biomarkers weremeasured in individual samples rather than pooling the disease andcontrol tissue; this allowed a better understanding of the individualand group variations in the phenotypes associated with the presence andabsence of disease (in this case lung cancer). The measurements wereanalyzed using the Mann-Whitney test.

Table 18 lists the biomarkers found to be useful in distinguishingsamples obtained from individuals with NSCLC from “control” samplesobtained from adjacent and distal uninvolved lung tissue obtained fromthe same individuals. Using a multiplex aptamer assay as describedherein, thirty-six biomarkers were discovered that distinguished thetumor tissue samples from samples obtained from adjacent and distal lungtissue in individuals who had been diagnosed with NSCLC. With referenceto Table 1, col. 2, it can be seen that eleven of the biomarkers overlapthose identified in serum samples as described in Example 2. Anadditional marker which was not measured in the original serumprofiling, MMP-12, has since been found to be a useful biomarker in bothserum and in tissue. Table 21 provides a list of the total number ofbiomarkers (eighty-six) identified in both the serum and tumor tissuesamples combined. Table 20 provides a list of the biomarkers identifiedwhich were unique to the tumor tissue samples (twenty-five).

While certain of the described lung cancer biomarkers are useful alonefor detecting and diagnosing lung cancer, methods are also describedherein for the grouping of multiple subsets of the lung cancerbiomarkers, where each grouping or subset selection is useful as a panelof three or more biomarkers, interchangeably referred to herein as a“biomarker panel” and a panel. Thus, various embodiments of the instantapplication provide combinations comprising N biomarkers, wherein N isat least two biomarkers. In other embodiments, N is selected from 2-86biomarkers (Table 21); 2-36 biomarkers (Table 18) or 2-25 biomarkers(Table 20). In other embodiments, N is selected from 2-86 (Table 21) andat least one of said N biomarkers is MMP-12. In other embodiments, N isselected from 2-25 (Table 20) and at least one of said N biomarkers isMMP-12. Representative panels of 2-5 biomarkers including MMP-12 as oneof the markers are set forth in Tables 22-25.

In yet other embodiments, the biomarkers are selected from those listedin Table 18 and N is selected to be any number from 2-7, 2-10, 2-15,2-20, 2-25, 2-30, 2-36. In other embodiments, N is selected to be anynumber from 3-7, 3-10, 3-15, 3-20, 3-25, 3-30, 3-36. In otherembodiments, N is selected to be any number from 4-7, 4-10, 4-15, 4-20,4-25, 4-30, 4-36. In other embodiments, N is selected to be any numberfrom 5-7, 5-10, 5-15, 5-20, 5-25, 5-30, 5-36. In other embodiments, N isselected to be any number from 6-10, 6-15, 6-20, 6-25, 6-30, 6-36. Inother embodiments, N is selected to be any number from 7-10, 7-15, 7-20,7-25, 7-30, 7-36. In other embodiments, N is selected to be any numberfrom 8-10, 8-15, 8-20, 8-25, 8-30, 8-36. In other embodiments, N isselected to be any number from 9-15, 9-20, 9-25, 9-30, 9-36. In otherembodiments, N is selected to be any number from 10-15, 10-20, 10-25,10-30, 10-36. It will be appreciated that N can be selected to encompasssimilar, but higher order, ranges.

In yet other embodiments the biomarkers are selected from those listedin Table 20 and N is selected to be any number from 2-7, 2-10, 2-15,2-20, 2-25. In other embodiments, N is selected to be any number from3-7, 3-10, 3-15, 3-20, 3-25. In other embodiments, N is selected to beany number from 4-7, 4-10, 4-15, 4-20, 4-25. In other embodiments, N isselected to be any number from 5-7, 5-10, 5-15, 5-20, 5-25. In otherembodiments, N is selected to be any number from 6-10, 6-15, 6-20, 6-25.In other embodiments, N is selected to be any number from 7-10, 7-15,7-20, 7-25. In other embodiments, N is selected to be any number from8-10, 8-15, 8-20, 8-25. In other embodiments, N is selected to be anynumber from 9-15, 9-20, 9-25. In other embodiments, N is selected to beany number from 10-15, 10-20, 10-25. In other embodiments, N is selectedto be any number from 9-15, 9-20, 9-25. In other embodiments, N isselected to be any number from 10-15, 10-20, 10-25. It will beappreciated that N can be selected to encompass similar, but higherorder, ranges.

In yet other embodiments the biomarkers are selected from those listedin Table 21 and N is selected to be any number from 2-7, 2-10, 2-15,2-20, 2-25, 2-30, 2-35, 2-40, 2-45, 2-50, 2-55, 2-60, 2-65, 2-70, 2-75,2-80, or 2-86. In other embodiments, N is selected to be any number from3-7, 3-10, 3-15, 3-20, 3-25, 3-30, 3-35, 3-40, 3-45, 3-50, 3-55, 3-60,3-65, 3-70, 3-75, 3-80, or 3-86. In other embodiments, N is selected tobe any number from 4-7, 4-10, 4-15, 4-20, 4-25, 4-30, 4-35, 4-40, 4-45,4-50, 4-55, 4-60, 4-65, 4-70, 4-75, 4-80, or 4-86. In other embodiments,N is selected to be any number from 5-7, 5-10, 5-15, 5-20, 5-25, 5-30,5-35, 5-40, 5-45, 5-50, 5-55, 5-60, 5-65, 5-70, 5-75, 5-80, or 5-86. Inother embodiments, N is selected to be any number from 6-10, 6-15, 6-20,6-25, 6-30, 6-35, 6-40, 6-45, 6-50, 6-55, 6-60, 6-65, 6-70, 6-75, 6-80,or 6-86. In other embodiments, N is selected to be any number from 7-10,7-15, 7-20, 7-25, 7-30, 7-35, 7-40, 7-45, 7-50, 7-55, 7-60, 7-65, 7-70,7-75, 7-80, or 7-86. In other embodiments, N is selected to be anynumber from 8-10, 8-15, 8-20, 8-25, 8-30, 8-35, 8-40, 8-45, 8-50, 8-55,8-60, 8-65, 8-70, 8-75, 8-80, or 8-86. In other embodiments, N isselected to be any number from 9-15, 9-20, 9-25, 9-30, 9-35, 9-40, 9-45,9-50, 9-55, 9-60, 9-65, 9-70, 9-75, 9-80, or 9-86. In other embodiments,N is selected to be any number from 10-15, 10-20, 10-25, 10-30, 10-35,10-40, 10-45, 10-50, 10-55, 10-60, 10-65, 10-70, 10-75, 10-80, or 10-86.It will be appreciated that N can be selected to encompass similar, buthigher order, ranges.

In one embodiment, the number of biomarkers useful for a biomarkersubset or panel is based on the sensitivity and specificity value forthe particular combination of biomarker values. The terms “sensitivity”and “specificity” are used herein with respect to the ability tocorrectly classify an individual, based on one or more biomarker valuesdetected in their biological sample, as having lung cancer or not havinglung cancer. “Sensitivity” indicates the performance of the biomarker(s)with respect to correctly classifying individuals that have lung cancer.“Specificity” indicates the performance of the biomarker(s) with respectto correctly classifying individuals who do not have lung cancer. Forexample, 85% specificity and 90% sensitivity for a panel of markers usedto test a set of control samples and lung cancer samples indicates that85% of the control samples were correctly classified as control samplesby the panel, and 90% of the lung cancer samples were correctlyclassified as lung cancer samples by the panel. The desired or preferredminimum value can be determined as described in Example 3.

In one aspect, lung cancer is detected or diagnosed in an individual byconducting an assay on a biological sample from the individual anddetecting biomarker values that each correspond to at least one of thebiomarkers MMP-7, MMP-12, or IGFBP-2 and at least N additionalbiomarkers selected from the list of biomarkers in Table 21, wherein Nequals 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 or 15. In a furtheraspect, lung cancer is detected or diagnosed in an individual byconducting an assay on a biological sample from the individual anddetecting biomarker values that each correspond to the biomarkers MMP-7,MMP-12, or IGFBP-2 and one of at least N additional biomarkers selectedfrom the list of biomarkers in Table 21, wherein N equals 1, 2, 3, 4, 5,6, 7, 8, 9, 10, 11, 12 or 13. In a further aspect, lung cancer isdetected or diagnosed in an individual by conducting an assay on abiological sample from the individual and detecting biomarker valuesthat each correspond to the biomarker MMP-7 and one of at least Nadditional biomarkers selected from the list of biomarkers in Table 21,wherein N equals 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 or 15. In afurther aspect, lung cancer is detected or diagnosed in an individual byconducting an assay on a biological sample from the individual anddetecting biomarker values that each correspond to the biomarker MMP-12and one of at least N additional biomarkers selected from the list ofbiomarkers in Table 21, wherein N equals 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,12, 13, 14 or 15. In a further aspect, lung cancer is detected ordiagnosed in an individual by conducting an assay on a biological samplefrom the individual and detecting biomarker values that each correspondto the biomarker IGFBP-2 and one of at least N additional biomarkersselected from the list of biomarkers in Table 21, wherein N equals 2, 3,4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 or 15.

The lung cancer biomarkers identified herein represent a relativelylarge number of choices for subsets or panels of biomarkers that can beused to effectively detect or diagnose lung cancer. Selection of thedesired number of such biomarkers depends on the specific combination ofbiomarkers chosen. It is important to remember that panels of biomarkersfor detecting or diagnosing lung cancer may also include biomarkers notfound in Tables 18, 20 or 21, and that the inclusion of additionalbiomarkers not found in Tables 18, 20 or 21 may reduce the number ofbiomarkers in the particular subset or panel that is selected fromTables 18, 20 or 21. The number of biomarkers from Tables 18, 20 or 21used in a subset or panel may also be reduced if additional biomedicalinformation is used in conjunction with the biomarker values toestablish acceptable sensitivity and specificity values for a givenassay.

Another factor that can affect the number of biomarkers to be used in asubset or panel of biomarkers is the procedures used to obtainbiological samples from individuals who are being diagnosed for lungcancer. In a carefully controlled sample procurement environment, thenumber of biomarkers necessary to meet desired sensitivity andspecificity values will be lower than in a situation where there can bemore variation in sample collection, handling and storage. In developingthe list of biomarkers set forth in Tables 18, 20 or 21, multiple samplecollection sites were utilized to collect data for classifier training.This provides for more robust biomarkers that are less sensitive tovariations in sample collection, handling and storage, but can alsorequire that the number of biomarkers in a subset or panel be largerthan if the training data were all obtained under very similarconditions.

One aspect of the instant application can be described generally withreference to FIGS. 1A and B. A biological sample is obtained from anindividual or individuals of interest. The biological sample is thenassayed to detect the presence of one or more (N) biomarkers of interestand to determine a biomarker value for each of said N biomarkers(referred to in FIG. 1B as marker RFU). Once a biomarker has beendetected and a biomarker value assigned each marker is scored orclassified as described in detail herein. The marker scores are thencombined to provide a total diagnostic score, which indicates thelikelihood that the individual from whom the sample was obtained haslung cancer.

As used herein, “lung” may be interchangeably referred to as“pulmonary”.

As used herein, “smoker” refers to an individual who has a history oftobacco smoke inhalation.

“Biological sample”, “sample”, and “test sample” are usedinterchangeably herein to refer to any material, biological fluid,tissue, or cell obtained or otherwise derived from an individual. Thisincludes blood (including whole blood, leukocytes, peripheral bloodmononuclear cells, buffy coat, plasma, and serum), sputum, tears, mucus,nasal washes, nasal aspirate, breath, urine, semen, saliva, meningealfluid, amniotic fluid, glandular fluid, lymph fluid, nipple aspirate,bronchial aspirate, synovial fluid, joint aspirate, cells, a cellularextract, and cerebrospinal fluid. This also includes experimentallyseparated fractions of all of the preceding. For example, a blood samplecan be fractionated into serum or into fractions containing particulartypes of blood cells, such as red blood cells or white blood cells(leukocytes). If desired, a sample can be a combination of samples froman individual, such as a combination of a tissue and fluid sample. Theterm “biological sample” also includes materials containing homogenizedsolid material, such as from a stool sample, a tissue sample, or atissue biopsy, for example. The term “biological sample” also includesmaterials derived from a tissue culture or a cell culture. Any suitablemethods for obtaining a biological sample can be employed; exemplarymethods include, e.g., phlebotomy, swab (e.g., buccal swab), and a fineneedle aspirate biopsy procedure. Exemplary tissues susceptible to fineneedle aspiration include lymph node, lung, lung washes, BAL(bronchoalveolar lavage), thyroid, breast, and liver. Samples can alsobe collected, e.g., by micro dissection (e.g., laser capture microdissection (LCM) or laser micro dissection (LMD)), bladder wash, smear(e.g., a PAP smear), or ductal lavage. A “biological sample” obtained orderived from an individual includes any such sample that has beenprocessed in any suitable manner after being obtained from theindividual.

A “Tissue sample” or “Tissue” refers to a certain subset of thebiological samples described above. According to this definition,tissues are collections of macromolecules in a heterogeneousenvironment. As used herein, tissue refers to a single cell type, acollection of cell types, an aggregate of cells, or an aggregate ofmacromolecules. Tissues are generally a physical array of macromoleculesthat can be either fluid or rigid, both in terms of structure andcomposition. Extracellular matrix is an example of a more rigid tissue,both structurally and compositionally, while a membrane bilayer is morefluid in structure and composition. Tissue includes, but is not limitedto, an aggregate of cells usually of a particular kind together withtheir intercellular substance that form one of the structural materialscommonly used to denote the general cellular fabric of a given organ,e.g., kidney tissue, brain tissue, lung tissue. The four general classesof tissues are epithelial tissue, connective tissue, nerve tissue, andmuscle tissue. Methods for identifying slow off-rate aptamers to tissuetargets are described in International Application Pub. No. WO2011/006075, published Jan. 13, 2011, which is incorporated herein byreference in its entirety.

Examples of tissues which fall within this definition include, but arenot limited to, heterogeneous aggregates of macromolecules such asfibrin clots which are acellular; homogeneous or heterogeneousaggregates of cells; higher ordered structures containing cells whichhave a specific function, such as organs, tumors, lymph nodes, arteries,etc.; and individual cells. Tissues or cells can be in their naturalenvironment, isolated, or in tissue culture. The tissue can be intact ormodified. The modification can include numerous changes such astransformation, transfection, activation, and substructure isolation,e.g., cell membranes, cell nuclei, cell organelles, etc.

Sources of the tissue, cell or subcellular structures can be obtainedfrom prokaryotes as well as eukaryotes. This includes human, animal,plant, bacterial, fungal and viral structures.

Further, it should be realized that a biological sample can be derivedby taking biological samples from a number of individuals and poolingthem or pooling an aliquot of each individual's biological sample. Thepooled sample can be treated as a sample from a single individual and ifthe presence of cancer is established in the pooled sample, then eachindividual biological sample can be re-tested to determine whichindividual/s have lung cancer.

For purposes of this specification, the phrase “data attributed to abiological sample from an individual” is intended to mean that the datain some form derived from, or were generated using, the biologicalsample of the individual. The data may have been reformatted, revised,or mathematically altered to some degree after having been generated,such as by conversion from units in one measurement system to units inanother measurement system; but, the data are understood to have beenderived from, or were generated using, the biological sample.

“Target”, “target molecule”, and “analyte” are used interchangeablyherein to refer to any molecule of interest that may be present in abiological sample. A “molecule of interest” includes any minor variationof a particular molecule, such as, in the case of a protein, forexample, minor variations in amino acid sequence, disulfide bondformation, glycosylation, lipidation, acetylation, phosphorylation, orany other manipulation or modification, such as conjugation with alabeling component, which does not substantially alter the identity ofthe molecule. A “target molecule”, “target”, or “analyte” is a set ofcopies of one type or species of molecule or multi-molecular structure.“Target molecules”, “targets”, and “analytes” refer to more than onesuch set of molecules. Exemplary target molecules include proteins,polypeptides, nucleic acids, carbohydrates, lipids, polysaccharides,glycoproteins, hormones, receptors, antigens, antibodies, affybodies,antibody mimics, viruses, pathogens, toxic substances, substrates,metabolites, transition state analogs, cofactors, inhibitors, drugs,dyes, nutrients, growth factors, cells, tissues, and any fragment orportion of any of the foregoing.

As used herein, “polypeptide,” “peptide,” and “protein” are usedinterchangeably herein to refer to polymers of amino acids of anylength. The polymer may be linear or branched, it may comprise modifiedamino acids, and it may be interrupted by non-amino acids. The termsalso encompass an amino acid polymer that has been modified naturally orby intervention; for example, disulfide bond formation, glycosylation,lipidation, acetylation, phosphorylation, or any other manipulation ormodification, such as conjugation with a labeling component. Alsoincluded within the definition are, for example, polypeptides containingone or more analogs of an amino acid (including, for example, unnaturalamino acids, etc.), as well as other modifications known in the art.Polypeptides can be single chains or associated chains. Also includedwithin the definition are preproteins and intact mature proteins;peptides or polypeptides derived from a mature protein; fragments of aprotein; splice variants; recombinant forms of a protein; proteinvariants with amino acid modifications, deletions, or substitutions;digests; and post-translational modifications, such as glycosylation,acetylation, phosphorylation, and the like.

As used herein, “marker” and “biomarker” are used interchangeably torefer to a target molecule that indicates or is a sign of a normal orabnormal process in an individual or of a disease or other condition inan individual. More specifically, a “marker” or “biomarker” is ananatomic, physiologic, biochemical, or molecular parameter associatedwith the presence of a specific physiological state or process, whethernormal or abnormal, and, if abnormal, whether chronic or acute.Biomarkers are detectable and measurable by a variety of methodsincluding laboratory assays and medical imaging. When a biomarker is aprotein, it is also possible to use the expression of the correspondinggene as a surrogate measure of the amount or presence or absence of thecorresponding protein biomarker in a biological sample or methylationstate of the gene encoding the biomarker or proteins that controlexpression of the biomarker.

As used herein, “biomarker value”, “value”, “biomarker level”, and“level” are used interchangeably to refer to a measurement that is madeusing any analytical method for detecting the biomarker in a biologicalsample and that indicates the presence, absence, absolute amount orconcentration, relative amount or concentration, titer, a level, anexpression level, a ratio of measured levels, or the like, of, for, orcorresponding to the biomarker in the biological sample. The exactnature of the “value” or “level” depends on the specific design andcomponents of the particular analytical method employed to detect thebiomarker.

When a biomarker indicates or is a sign of an abnormal process or adisease or other condition in an individual, that biomarker is generallydescribed as being either over-expressed or under-expressed as comparedto an expression level or value of the biomarker that indicates or is asign of a normal process or an absence of a disease or other conditionin an individual.

“Up-regulation”, “up-regulated”, “over-expression”, “over-expressed”,and any variations thereof are used interchangeably to refer to a valueor level of a biomarker in a biological sample that is greater than avalue or level (or range of values or levels) of the biomarker that istypically detected in similar biological samples from healthy or normalindividuals. The terms may also refer to a value or level of a biomarkerin a biological sample that is greater than a value or level (or rangeof values or levels) of the biomarker that may be detected at adifferent stage of a particular disease.

“Down-regulation”, “down-regulated”, “under-expression”,“under-expressed”, and any variations thereof are used interchangeablyto refer to a value or level of a biomarker in a biological sample thatis less than a value or level (or range of values or levels) of thebiomarker that is typically detected in similar biological samples fromhealthy or normal individuals. The terms may also refer to a value orlevel of a biomarker in a biological sample that is less than a value orlevel (or range of values or levels) of the biomarker that may bedetected at a different stage of a particular disease.

Further, a biomarker that is either over-expressed or under-expressedcan also be referred to as being “differentially expressed” or as havinga “differential level” or “differential value” as compared to a “normal”expression level or value of the biomarker that indicates or is a signof a normal process or an absence of a disease or other condition in anindividual. Thus, “differential expression” of a biomarker can also bereferred to as a variation from a “normal” expression level of thebiomarker.

The term “differential gene expression” and “differential expression”are used interchangeably to refer to a gene (or its correspondingprotein expression product) whose expression is activated to a higher orlower level in a subject suffering from a specific disease, relative toits expression in a normal or control subject. The terms also includegenes (or the corresponding protein expression products) whoseexpression is activated to a higher or lower level at different stagesof the same disease. It is also understood that a differentiallyexpressed gene may be either activated or inhibited at the nucleic acidlevel or protein level, or may be subject to alternative splicing toresult in a different polypeptide product. Such differences may beevidenced by a variety of changes including mRNA levels, surfaceexpression, secretion or other partitioning of a polypeptide.Differential gene expression may include a comparison of expressionbetween two or more genes or their gene products; or a comparison of theratios of the expression between two or more genes or their geneproducts; or even a comparison of two differently processed products ofthe same gene, which differ between normal subjects and subjectssuffering from a disease; or between various stages of the same disease.Differential expression includes both quantitative, as well asqualitative, differences in the temporal or cellular expression patternin a gene or its expression products among, for example, normal anddiseased cells, or among cells which have undergone different diseaseevents or disease stages.

As used herein, “individual” refers to a test subject or patient. Theindividual can be a mammal or a non-mammal. In various embodiments, theindividual is a mammal. A mammalian individual can be a human ornon-human. In various embodiments, the individual is a human. A healthyor normal individual is an individual in which the disease or conditionof interest (including, for example, lung diseases, lung-associateddiseases, or other lung conditions) is not detectable by conventionaldiagnostic methods.

“Diagnose”, “diagnosing”, “diagnosis”, and variations thereof refer tothe detection, determination, or recognition of a health status orcondition of an individual on the basis of one or more signs, symptoms,data, or other information pertaining to that individual. The healthstatus of an individual can be diagnosed as healthy/normal (i.e., adiagnosis of the absence of a disease or condition) or diagnosed asill/abnormal (i.e., a diagnosis of the presence, or an assessment of thecharacteristics, of a disease or condition). The terms “diagnose”,“diagnosing”, “diagnosis”, etc., encompass, with respect to a particulardisease or condition, the initial detection of the disease; thecharacterization or classification of the disease; the detection of theprogression, remission, or recurrence of the disease; and the detectionof disease response after the administration of a treatment or therapyto the individual. The diagnosis of lung cancer includes distinguishingindividuals, including smokers and nonsmokers, who have cancer fromindividuals who do not. It further includes distinguishing benignpulmonary nodules from cancerous pulmonary nodules.

“Prognose”, “prognosing”, “prognosis”, and variations thereof refer tothe prediction of a future course of a disease or condition in anindividual who has the disease or condition (e.g., predicting patientsurvival), and such terms encompass the evaluation of disease responseafter the administration of a treatment or therapy to the individual.

“Evaluate”, “evaluating”, “evaluation”, and variations thereof encompassboth “diagnose” and “prognose” and also encompass determinations orpredictions about the future course of a disease or condition in anindividual who does not have the disease as well as determinations orpredictions regarding the likelihood that a disease or condition willrecur in an individual who apparently has been cured of the disease. Theterm “evaluate” also encompasses assessing an individual's response to atherapy, such as, for example, predicting whether an individual islikely to respond favorably to a therapeutic agent or is unlikely torespond to a therapeutic agent (or will experience toxic or otherundesirable side effects, for example), selecting a therapeutic agentfor administration to an individual, or monitoring or determining anindividual's response to a therapy that has been administered to theindividual. Thus, “evaluating” lung cancer can include, for example, anyof the following: prognosing the future course of lung cancer in anindividual; predicting the recurrence of lung cancer in an individualwho apparently has been cured of lung cancer; or determining orpredicting an individual's response to a lung cancer treatment orselecting a lung cancer treatment to administer to an individual basedupon a determination of the biomarker values derived from theindividual's biological sample.

Any of the following examples may be referred to as either “diagnosing”or “evaluating” lung cancer: initially detecting the presence or absenceof lung cancer; determining a specific stage, type or sub-type, or otherclassification or characteristic of lung cancer; determining whether apulmonary nodule is a benign lesion or a malignant lung tumor; ordetecting/monitoring lung cancer progression (e.g., monitoring lungtumor growth or metastatic spread), remission, or recurrence.

As used herein, “additional biomedical information” refers to one ormore evaluations of an individual, other than using any of thebiomarkers described herein, that are associated with lung cancer risk.“Additional biomedical information” includes any of the following:physical descriptors of an individual, physical descriptors of apulmonary nodule observed by CT imaging, the height and/or weight of anindividual, the gender of an individual, the ethnicity of an individual,smoking history, occupational history, exposure to known carcinogens(e.g., exposure to any of asbestos, radon gas, chemicals, smoke fromfires, and air pollution, which can include emissions from stationary ormobile sources such as industrial/factory or auto/marine/aircraftemissions), exposure to second-hand smoke, family history of lung cancer(or other cancer), the presence of pulmonary nodules, size of nodules,location of nodules, morphology of nodules (e.g., as observed through CTimaging, ground glass opacity (GGO), solid, non-solid), edgecharacteristics of the nodule (e.g., smooth, lobulated, sharp andsmooth, spiculated, infiltrating), and the like. Smoking history isusually quantified in terms of “pack years”, which refers to the numberof years a person has smoked multiplied by the average number of packssmoked per day. For example, a person who has smoked, on average, onepack of cigarettes per day for 35 years is referred to as having 35 packyears of smoking history. Additional biomedical information can beobtained from an individual using routine techniques known in the art,such as from the individual themselves by use of a routine patientquestionnaire or health history questionnaire, etc., or from a medicalpractitioner, etc. Alternately, additional biomedical information can beobtained from routine imaging techniques, including CT imaging (e.g.,low-dose CT imaging) and X-ray. Testing of biomarker levels incombination with an evaluation of any additional biomedical informationmay, for example, improve sensitivity, specificity, and/or AUC fordetecting lung cancer (or other lung cancer-related uses) as compared tobiomarker testing alone or evaluating any particular item of additionalbiomedical information alone (e.g., CT imaging alone).

The term “area under the curve” or “AUC” refers to the area under thecurve of a receiver operating characteristic (ROC) curve, both of whichare well known in the art. AUC measures are useful for comparing theaccuracy of a classifier across the complete data range. Classifierswith a greater AUC have a greater capacity to classify unknownscorrectly between two groups of interest (e.g., lung cancer samples andnormal or control samples). ROC curves are useful for plotting theperformance of a particular feature (e.g., any of the biomarkersdescribed herein and/or any item of additional biomedical information)in distinguishing between two populations (e.g., cases having lungcancer and controls without lung cancer). Typically, the feature dataacross the entire population (e.g., the cases and controls) are sortedin ascending order based on the value of a single feature. Then, foreach value for that feature, the true positive and false positive ratesfor the data are calculated. The true positive rate is determined bycounting the number of cases above the value for that feature and thendividing by the total number of cases. The false positive rate isdetermined by counting the number of controls above the value for thatfeature and then dividing by the total number of controls. Although thisdefinition refers to scenarios in which a feature is elevated in casescompared to controls, this definition also applies to scenarios in whicha feature is lower in cases compared to the controls (in such ascenario, samples below the value for that feature would be counted).ROC curves can be generated for a single feature as well as for othersingle outputs, for example, a combination of two or more features canbe mathematically combined (e.g., added, subtracted, multiplied, etc.)to provide a single sum value, and this single sum value can be plottedin a ROC curve. Additionally, any combination of multiple features, inwhich the combination derives a single output value, can be plotted in aROC curve. These combinations of features may comprise a test. The ROCcurve is the plot of the true positive rate (sensitivity) of a testagainst the false positive rate (1-specificity) of the test.

As used herein, “detecting” or “determining” with respect to a biomarkervalue includes the use of both the instrument required to observe andrecord a signal corresponding to a biomarker value and the material/srequired to generate that signal. In various embodiments, the biomarkervalue is detected using any suitable method, including fluorescence,chemiluminescence, surface plasmon resonance, surface acoustic waves,mass spectrometry, infrared spectroscopy, Raman spectroscopy, atomicforce microscopy, scanning tunneling microscopy, electrochemicaldetection methods, nuclear magnetic resonance, quantum dots, and thelike.

“Solid support” refers herein to any substrate having a surface to whichmolecules may be attached, directly or indirectly, through eithercovalent or non-covalent bonds. A “solid support” can have a variety ofphysical formats, which can include, for example, a membrane; a chip(e.g., a protein chip); a slide (e.g., a glass slide or coverslip); acolumn; a hollow, solid, semi-solid, pore- or cavity-containingparticle, such as, for example, a bead; a gel; a fiber, including afiber optic material; a matrix; and a sample receptacle. Exemplarysample receptacles include sample wells, tubes, capillaries, vials, andany other vessel, groove or indentation capable of holding a sample. Asample receptacle can be contained on a multi-sample platform, such as amicrotiter plate, slide, microfluidics device, and the like. A supportcan be composed of a natural or synthetic material, an organic orinorganic material. The composition of the solid support on whichcapture reagents are attached generally depends on the method ofattachment (e.g., covalent attachment). Other exemplary receptaclesinclude microdroplets and microfluidic controlled or bulk oil/aqueousemulsions within which assays and related manipulations can occur.Suitable solid supports include, for example, plastics, resins,polysaccharides, silica or silica-based materials, functionalized glass,modified silicon, carbon, metals, inorganic glasses, membranes, nylon,natural fibers (such as, for example, silk, wool and cotton), polymers,and the like. The material composing the solid support can includereactive groups such as, for example, carboxy, amino, or hydroxylgroups, which are used for attachment of the capture reagents. Polymericsolid supports can include, e.g., polystyrene, polyethylene glycoltetraphthalate, polyvinyl acetate, polyvinyl chloride, polyvinylpyrrolidone, polyacrylonitrile, polymethyl methacrylate,polytetrafluoroethylene, butyl rubber, styrenebutadiene rubber, naturalrubber, polyethylene, polypropylene, (poly)tetrafluoroethylene,(poly)vinylidenefluoride, polycarbonate, and polymethylpentene. Suitablesolid support particles that can be used include, e.g., encodedparticles, such as Luminex®-type encoded particles, magnetic particles,and glass particles.

Exemplary Uses of Biomarkers

In various exemplary embodiments, methods are provided for diagnosinglung cancer in an individual by detecting one or more biomarker valuescorresponding to one or more biomarkers that are present in the lungtissue of an individual by any number of analytical methods, includingany of the analytical methods described herein. These biomarkers are,for example, differentially expressed in individuals with lung cancer ascompared to individuals without lung cancer, particularly NSCLC.Detection of the differential expression of a biomarker in an individualcan be used, for example, to permit the early diagnosis of lung cancer,to distinguish between a benign and malignant pulmonary nodule (such as,for example, a nodule observed on a computed tomography (CT) scan), tomonitor lung cancer recurrence, or for other clinical indications,including determination of prognosis and methods of treatment.

Any of the biomarkers described herein may be used in a variety ofclinical indications for lung cancer, including any of the following:detection of lung cancer (such as in a high-risk individual orpopulation); characterizing lung cancer (e.g., determining lung cancertype, sub-type, or stage), such as by distinguishing between non-smallcell lung cancer (NSCLC) and small cell lung cancer (SCLC) and/orbetween adenocarcinoma and squamous cell carcinoma (or otherwisefacilitating histopathology); determining whether a lung nodule is abenign nodule or a malignant lung tumor; determining lung cancerprognosis; monitoring lung cancer progression or remission; monitoringfor lung cancer recurrence; monitoring metastasis; treatment selection;monitoring response to a therapeutic agent or other treatment;stratification of individuals for computed tomography (CT) screening(e.g., identifying those individuals at greater risk of lung cancer andthereby most likely to benefit from spiral-CT screening, thus increasingthe positive predictive value of CT); combining biomarker testing withadditional biomedical information, such as smoking history, etc., orwith nodule size, morphology, etc. (such as to provide an assay withincreased diagnostic performance compared to CT testing or biomarkertesting alone); facilitating the diagnosis of a pulmonary nodule asmalignant or benign; facilitating clinical decision making once apulmonary nodule is observed on CT (e.g., ordering repeat CT scans ifthe nodule is deemed to be low risk, such as if a biomarker-based testis negative, with or without categorization of nodule size, orconsidering biopsy if the nodule is deemed medium to high risk, such asif a biomarker-based test is positive, with or without categorization ofnodule size); and facilitating decisions regarding clinical follow-up(e.g., whether to implement repeat CT scans, fine needle biopsy, orthoracotomy after observing a non-calcified nodule on CT). Biomarkertesting may improve positive predictive value (PPV) over CT screeningalone. In addition to their utilities in conjunction with CT screening,the biomarkers described herein can also be used in conjunction with anyother imaging modalities used for lung cancer, such as chest X-ray.Furthermore, the described biomarkers may also be useful in permittingcertain of these uses before indications of lung cancer are detected byimaging modalities or other clinical correlates, or before symptomsappear.

As an example of the manner in which any of the biomarkers describedherein can be used to diagnose lung cancer, differential expression ofone or more of the described biomarkers in an individual who is notknown to have lung cancer may indicate that the individual has lungcancer, thereby enabling detection of lung cancer at an early stage ofthe disease when treatment is most effective, perhaps before the lungcancer is detected by other means or before symptoms appear.Over-expression of one or more of the biomarkers during the course oflung cancer may be indicative of lung cancer progression, e.g., a lungtumor is growing and/or metastasizing (and thus indicate a poorprognosis), whereas a decrease in the degree to which one or more of thebiomarkers is differentially expressed (i.e., in subsequent biomarkertests, the expression level in the individual is moving toward orapproaching a “normal” expression level) may be indicative of lungcancer remission, e.g., a lung tumor is shrinking (and thus indicate agood or better prognosis). Similarly, an increase in the degree to whichone or more of the biomarkers is differentially expressed (i.e., insubsequent biomarker tests, the expression level in the individual ismoving further away from a “normal” expression level) during the courseof lung cancer treatment may indicate that the lung cancer isprogressing and therefore indicate that the treatment is ineffective,whereas a decrease in differential expression of one or more of thebiomarkers during the course of lung cancer treatment may be indicativeof lung cancer remission and therefore indicate that the treatment isworking successfully. Additionally, an increase or decrease in thedifferential expression of one or more of the biomarkers after anindividual has apparently been cured of lung cancer may be indicative oflung cancer recurrence. In a situation such as this, for example, theindividual can be re-started on therapy (or the therapeutic regimenmodified such as to increase dosage amount and/or frequency, if theindividual has maintained therapy) at an earlier stage than if therecurrence of lung cancer was not detected until later. Furthermore, adifferential expression level of one or more of the biomarkers in anindividual may be predictive of the individual's response to aparticular therapeutic agent. In monitoring for lung cancer recurrenceor progression, changes in the biomarker expression levels may indicatethe need for repeat imaging (e.g., repeat CT scanning), such as todetermine lung cancer activity or to determine the need for changes intreatment.

Detection of any of the biomarkers described herein may be particularlyuseful following, or in conjunction with, lung cancer treatment, such asto evaluate the success of the treatment or to monitor lung cancerremission, recurrence, and/or progression (including metastasis)following treatment. Lung cancer treatment may include, for example,administration of a therapeutic agent to the individual, performance ofsurgery (e.g., surgical resection of at least a portion of a lungtumor), administration of radiation therapy, or any other type of lungcancer treatment used in the art, and any combination of thesetreatments. For example, any of the biomarkers may be detected at leastonce after treatment or may be detected multiple times after treatment(such as at periodic intervals), or may be detected both before andafter treatment. Differential expression levels of any of the biomarkersin an individual over time may be indicative of lung cancer progression,remission, or recurrence, examples of which include any of thefollowing: an increase or decrease in the expression level of thebiomarkers after treatment compared with the expression level of thebiomarker before treatment; an increase or decrease in the expressionlevel of the biomarker at a later time point after treatment comparedwith the expression level of the biomarker at an earlier time pointafter treatment; and a differential expression level of the biomarker ata single time point after treatment compared with normal levels of thebiomarker.

As a specific example, the biomarker levels for any of the biomarkersdescribed herein can be determined in pre-surgery and post-surgery(e.g., 2-4 weeks after surgery) serum samples. An increase in thebiomarker expression level(s) in the post-surgery sample compared withthe pre-surgery sample can indicate progression of lung cancer (e.g.,unsuccessful surgery), whereas a decrease in the biomarker expressionlevel(s) in the post-surgery sample compared with the pre-surgery samplecan indicate regression of lung cancer (e.g., the surgery successfullyremoved the lung tumor). Similar analyses of the biomarker levels can becarried out before and after other forms of treatment, such as beforeand after radiation therapy or administration of a therapeutic agent orcancer vaccine.

In addition to testing biomarker levels as a stand-alone diagnostictest, biomarker levels can also be done in conjunction withdetermination of SNPs or other genetic lesions or variability that areindicative of increased risk of susceptibility of disease. (See, e.g.,Amos et al., Nature Genetics 40, 616-622 (2009)).

In addition to testing biomarker levels as a stand-alone diagnostictest, biomarker levels can also be done in conjunction with CTscreening. For example, the biomarkers may facilitate the medical andeconomic justification for implementing CT screening, such as forscreening large asymptomatic populations at risk for lung cancer (e.g.,smokers). For example, a “pre-CT” test of biomarker levels could be usedto stratify high-risk individuals for CT screening, such as foridentifying those who are at highest risk for lung cancer based on theirbiomarker levels and who should be prioritized for CT screening. If a CTtest is implemented, biomarker levels (e.g., as determined by an aptamerassay of serum or plasma samples) of one or more biomarkers can bemeasured and the diagnostic score could be evaluated in conjunction withadditional biomedical information (e.g., tumor parameters determined byCT testing) to enhance positive predictive value (PPV) over CT orbiomarker testing alone. A “post-CT” aptamer panel for determiningbiomarker levels can be used to determine the likelihood that apulmonary nodule observed by CT (or other imaging modality) is malignantor benign.

Detection of any of the biomarkers described herein may be useful forpost-CT testing. For example, biomarker testing may eliminate or reducea significant number of false positive tests over CT alone. Further,biomarker testing may facilitate treatment of patients. By way ofexample, if a lung nodule is less than 5 mm in size, results ofbiomarker testing may advance patients from “watch and wait” to biopsyat an earlier time; if a lung nodule is 5-9 mm, biomarker testing mayeliminate the use of a biopsy or thoracotomy on false positive scans;and if a lung nodule is larger than 10 mm, biomarker testing mayeliminate surgery for a sub-population of these patients with benignnodules. Eliminating the need for biopsy in some patients based onbiomarker testing would be beneficial because there is significantmorbidity associated with nodule biopsy and difficulty in obtainingnodule tissue depending on the location of nodule. Similarly,eliminating the need for surgery in some patients, such as those whosenodules are actually benign, would avoid unnecessary risks and costsassociated with surgery.

In addition to testing biomarker levels in conjunction with CT screening(e.g., assessing biomarker levels in conjunction with size or othercharacteristics of a lung nodule observed on a CT scan), informationregarding the biomarkers can also be evaluated in conjunction with othertypes of data, particularly data that indicates an individual's risk forlung cancer (e.g., patient clinical history, symptoms, family history ofcancer, risk factors such as whether or not the individual is a smoker,and/or status of other biomarkers, etc.). These various data can beassessed by automated methods, such as a computer program/software,which can be embodied in a computer or other apparatus/device.

Any of the described biomarkers may also be used in imaging tests. Forexample, an imaging agent can be coupled to any of the describedbiomarkers, which can be used to aid in lung cancer diagnosis, tomonitor disease progression/remission or metastasis, to monitor fordisease recurrence, or to monitor response to therapy, among other uses.

Detection and Determination of Biomarkers and Biomarker Values

A biomarker value for the biomarkers described herein can be detectedusing any of a variety of known analytical methods. In one embodiment, abiomarker value is detected using a capture reagent. As used herein, a“capture agent” or “capture reagent” refers to a molecule that iscapable of binding specifically to a biomarker. In various embodiments,the capture reagent can be exposed to the biomarker in solution or canbe exposed to the biomarker while the capture reagent is immobilized ona solid support. In other embodiments, the capture reagent contains afeature that is reactive with a secondary feature on a solid support. Inthese embodiments, the capture reagent can be exposed to the biomarkerin solution, and then the feature on the capture reagent can be used inconjunction with the secondary feature on the solid support toimmobilize the biomarker on the solid support. The capture reagent isselected based on the type of analysis to be conducted. Capture reagentsinclude but are not limited to aptamers, antibodies, adnectins,ankyrins, other antibody mimetics and other protein scaffolds,autoantibodies, chimeras, small molecules, an F(ab′)₂ fragment, a singlechain antibody fragment, an Fv fragment, a single chain Fv fragment, anucleic acid, a lectin, a ligand-binding receptor, affybodies,nanobodies, imprinted polymers, avimers, peptidomimetics, a hormonereceptor, a cytokine receptor, and synthetic receptors, andmodifications and fragments of these.

In some embodiments, a biomarker value is detected using abiomarker/capture reagent complex.

In other embodiments, the biomarker value is derived from thebiomarker/capture reagent complex and is detected indirectly, such as,for example, as a result of a reaction that is subsequent to thebiomarker/capture reagent interaction, but is dependent on the formationof the biomarker/capture reagent complex.

In some embodiments, the biomarker value is detected directly from thebiomarker in a biological sample.

In one embodiment, the biomarkers are detected using a multiplexedformat that allows for the simultaneous detection of two or morebiomarkers in a biological sample. In one embodiment of the multiplexedformat, capture reagents are immobilized, directly or indirectly,covalently or non-covalently, in discrete locations on a solid support.In another embodiment, a multiplexed format uses discrete solid supportswhere each solid support has a unique capture reagent associated withthat solid support, such as, for example quantum dots. In anotherembodiment, an individual device is used for the detection of each oneof multiple biomarkers to be detected in a biological sample. Individualdevices can be configured to permit each biomarker in the biologicalsample to be processed simultaneously. For example, a microtiter platecan be used such that each well in the plate is used to uniquely analyzeone of multiple biomarkers to be detected in a biological sample.

In one or more of the foregoing embodiments, a fluorescent tag can beused to label a component of the biomarker/capture complex to enable thedetection of the biomarker value. In various embodiments, thefluorescent label can be conjugated to a capture reagent specific to anyof the biomarkers described herein using known techniques, and thefluorescent label can then be used to detect the corresponding biomarkervalue. Suitable fluorescent labels include rare earth chelates,fluorescein and its derivatives, rhodamine and its derivatives, dansyl,allophycocyanin, PBXL-3, Qdot 605, Lissamine, phycoerythrin, Texas Red,and other such compounds.

In one embodiment, the fluorescent label is a fluorescent dye molecule.In some embodiments, the fluorescent dye molecule includes at least onesubstituted indolium ring system in which the substituent on the3-carbon of the indolium ring contains a chemically reactive group or aconjugated substance. In some embodiments, the dye molecule includes anAlexFluor molecule, such as, for example, AlexaFluor 488, AlexaFluor532, AlexaFluor 647, AlexaFluor 680, or AlexaFluor 700. In otherembodiments, the dye molecule includes a first type and a second type ofdye molecule, such as, e.g., two different AlexaFluor molecules. Inother embodiments, the dye molecule includes a first type and a secondtype of dye molecule, and the two dye molecules have different emissionspectra.

Fluorescence can be measured with a variety of instrumentationcompatible with a wide range of assay formats. For example,spectrofluorimeters have been designed to analyze microtiter plates,microscope slides, printed arrays, cuvettes, etc. See Principles ofFluorescence Spectroscopy, by J. R. Lakowicz, Springer Science+BusinessMedia, Inc., 2004; Bioluminescence & Chemiluminescence: Progress &Current Applications; Philip E. Stanley and Larry J. Kricka editors,World Scientific Publishing Company, January 2002.

In one or more of the foregoing embodiments, a chemiluminescence tag canoptionally be used to label a component of the biomarker/capture complexto enable the detection of a biomarker value. Suitable chemiluminescentmaterials include any of oxalyl chloride, Rodamin 6G, Ru(bipy)₃ ²⁺, TMAE(tetrakis(dimethylamino)ethylene), pyrogallol (1,2,3-trihydroxibenzene),Lucigenin, peroxyoxalates, aryl oxalates, acridinium esters, dioxetanes,and others.

In yet other embodiments, the detection method includes anenzyme/substrate combination that generates a detectable signal thatcorresponds to the biomarker value. Generally, the enzyme catalyzes achemical alteration of the chromogenic substrate which can be measuredusing various techniques, including spectrophotometry, fluorescence, andchemiluminescence. Suitable enzymes include, for example, luciferases,luciferin, malate dehydrogenase, urease, horseradish peroxidase (HRPO),alkaline phosphatase, beta-galactosidase, glucoamylase, lysozyme,glucose oxidase, galactose oxidase, and glucose-6-phosphatedehydrogenase, uricase, xanthine oxidase, lactoperoxidase,microperoxidase, and the like.

In yet other embodiments, the detection method can be a combination offluorescence, chemiluminescence, radionuclide or enzyme/substratecombinations that generate a measurable signal. Multimodal signalingcould have unique and advantageous characteristics in biomarker assayformats.

More specifically, the biomarker values for the biomarkers describedherein can be detected using known analytical methods including,singleplex aptamer assays, multiplexed aptamer assays, singleplex ormultiplexed immunoassays, mRNA expression profiling, miRNA expressionprofiling, mass spectrometric analysis, histological/cytologicalmethods, etc. as detailed below.

Determination of Biomarker Values Using Aptamer-Based Assays

Assays directed to the detection and quantification of physiologicallysignificant molecules in biological samples and other samples areimportant tools in scientific research and in the health care field. Oneclass of such assays involves the use of a microarray that includes oneor more aptamers immobilized on a solid support. The aptamers are eachcapable of binding to a target molecule in a highly specific manner andwith very high affinity. See, e.g., U.S. Pat. No. 5,475,096 entitled“Nucleic Acid Ligands,” see also, e.g., U.S. Pat. No. 6,242,246, U.S.Pat. No. 6,458,543, and U.S. Pat. No. 6,503,715, each of which isentitled “Nucleic Acid Ligand Diagnostic Biochip”. Once the microarrayis contacted with a sample, the aptamers bind to their respective targetmolecules present in the sample and thereby enable a determination of abiomarker value corresponding to a biomarker.

As used herein, an “aptamer” refers to a nucleic acid that has aspecific binding affinity for a target molecule. It is recognized thataffinity interactions are a matter of degree; however, in this context,the “specific binding affinity” of an aptamer for its target means thatthe aptamer binds to its target generally with a much higher degree ofaffinity than it binds to other components in a test sample. An“aptamer” is a set of copies of one type or species of nucleic acidmolecule that has a particular nucleotide sequence. An aptamer caninclude any suitable number of nucleotides, including any number ofchemically modified nucleotides. “Aptamers” refers to more than one suchset of molecules. Different aptamers can have either the same ordifferent numbers of nucleotides. Aptamers can be DNA or RNA orchemically modified nucleic acids and can be single stranded, doublestranded, or contain double stranded regions, and can include higherordered structures. An aptamer can also be a photoaptamer, where aphotoreactive or chemically reactive functional group is included in theaptamer to allow it to be covalently linked to its corresponding target.Any of the aptamer methods disclosed herein can include the use of twoor more aptamers that specifically bind the same target molecule. Asfurther described below, an aptamer may include a tag. If an aptamerincludes a tag, all copies of the aptamer need not have the same tag.Moreover, if different aptamers each include a tag, these differentaptamers can have either the same tag or a different tag.

An aptamer can be identified using any known method, including the SELEXprocess. Once identified, an aptamer can be prepared or synthesized inaccordance with any known method, including chemical synthetic methodsand enzymatic synthetic methods.

The terms “SELEX” and “SELEX process” are used interchangeably herein torefer generally to a combination of (1) the selection of aptamers thatinteract with a target molecule in a desirable manner, for examplebinding with high affinity to a protein, with (2) the amplification ofthose selected nucleic acids. The SELEX process can be used to identifyaptamers with high affinity to a specific target or biomarker.

SELEX generally includes preparing a candidate mixture of nucleic acids,binding of the candidate mixture to the desired target molecule to forman affinity complex, separating the affinity complexes from the unboundcandidate nucleic acids, separating and isolating the nucleic acid fromthe affinity complex, purifying the nucleic acid, and identifying aspecific aptamer sequence. The process may include multiple rounds tofurther refine the affinity of the selected aptamer. The process caninclude amplification steps at one or more points in the process. See,e.g., U.S. Pat. No. 5,475,096, entitled “Nucleic Acid Ligands.” TheSELEX process can be used to generate an aptamer that covalently bindsits target as well as an aptamer that non-covalently binds its target.See, e.g., U.S. Pat. No. 5,705,337 entitled “Systematic Evolution ofNucleic Acid Ligands by Exponential Enrichment: Chemi-SELEX.”

The SELEX process can be used to identify high-affinity aptamerscontaining modified nucleotides that confer improved characteristics onthe aptamer, such as, for example, improved in vivo stability orimproved delivery characteristics. Examples of such modificationsinclude chemical substitutions at the ribose and/or phosphate and/orbase positions. SELEX process-identified aptamers containing modifiednucleotides are described in U.S. Pat. No. 5,660,985, entitled “HighAffinity Nucleic Acid Ligands Containing Modified Nucleotides,” whichdescribes oligonucleotides containing nucleotide derivatives chemicallymodified at the 5′- and 2′-positions of pyrimidines. U.S. Pat. No.5,580,737, see supra, describes highly specific aptamers containing oneor more nucleotides modified with 2′-amino (2′-NH₂), 2′-fluoro (2′-F),and/or 2′-O-methyl (2′-OMe). See also, U.S. Patent ApplicationPublication 20090098549, entitled “SELEX and PHOTOSELEX,” whichdescribes nucleic acid libraries having expanded physical and chemicalproperties and their use in SELEX and photoSELEX.

SELEX can also be used to identify aptamers that have desirable off-ratecharacteristics. See U.S. Patent Application Publication 20090004667,entitled “Method for Generating Aptamers with Improved Off-Rates,” whichdescribes improved SELEX methods for generating aptamers that can bindto target molecules. Methods for producing aptamers and photoaptamershaving slower rates of dissociation from their respective targetmolecules are described. The methods involve contacting the candidatemixture with the target molecule, allowing the formation of nucleicacid-target complexes to occur, and performing a slow off-rateenrichment process wherein nucleic acid-target complexes with fastdissociation rates will dissociate and not reform, while complexes withslow dissociation rates will remain intact. Additionally, the methodsinclude the use of modified nucleotides in the production of candidatenucleic acid mixtures to generate aptamers with improved off-rateperformance.

A variation of this assay employs aptamers that include photoreactivefunctional groups that enable the aptamers to covalently bind or“photocrosslink” their target molecules. See, e.g., U.S. Pat. No.6,544,776 entitled “Nucleic Acid Ligand Diagnostic Biochip.” Thesephotoreactive aptamers are also referred to as photoaptamers. See, e.g.,U.S. Pat. No. 5,763,177, U.S. Pat. No. 6,001,577, and U.S. Pat. No.6,291,184, each of which is entitled “Systematic Evolution of NucleicAcid Ligands by Exponential Enrichment: Photoselection of Nucleic AcidLigands and Solution SELEX,” see also, e.g., U.S. Pat. No. 6,458,539,entitled “Photoselection of Nucleic Acid Ligands.” After the microarrayis contacted with the sample and the photoaptamers have had anopportunity to bind to their target molecules, the photoaptamers arephotoactivated, and the solid support is washed to remove anynon-specifically bound molecules. Harsh wash conditions may be used,since target molecules that are bound to the photoaptamers are generallynot removed, due to the covalent bonds created by the photoactivatedfunctional group(s) on the photoaptamers. In this manner, the assayenables the detection of a biomarker value corresponding to a biomarkerin the test sample.

In both of these assay formats, the aptamers are immobilized on thesolid support prior to being contacted with the sample. Under certaincircumstances, however, immobilization of the aptamers prior to contactwith the sample may not provide an optimal assay. For example,pre-immobilization of the aptamers may result in inefficient mixing ofthe aptamers with the target molecules on the surface of the solidsupport, perhaps leading to lengthy reaction times and, therefore,extended incubation periods to permit efficient binding of the aptamersto their target molecules. Further, when photoaptamers are employed inthe assay and depending upon the material utilized as a solid support,the solid support may tend to scatter or absorb the light used to effectthe formation of covalent bonds between the photoaptamers and theirtarget molecules. Moreover, depending upon the method employed,detection of target molecules bound to their aptamers can be subject toimprecision, since the surface of the solid support may also be exposedto and affected by any labeling agents that are used. Finally,immobilization of the aptamers on the solid support generally involvesan aptamer-preparation step (i.e., the immobilization) prior to exposureof the aptamers to the sample, and this preparation step may affect theactivity or functionality of the aptamers.

Aptamer assays that permit an aptamer to capture its target in solutionand then employ separation steps that are designed to remove specificcomponents of the aptamer-target mixture prior to detection have alsobeen described (see U.S. Patent Application Publication 20090042206,entitled “Multiplexed Analyses of Test Samples”). The described aptamerassay methods enable the detection and quantification of a non-nucleicacid target (e.g., a protein target) in a test sample by detecting andquantifying a nucleic acid (i.e., an aptamer). The described methodscreate a nucleic acid surrogate (i.e, the aptamer) for detecting andquantifying a non-nucleic acid target, thus allowing the wide variety ofnucleic acid technologies, including amplification, to be applied to abroader range of desired targets, including protein targets.

Aptamers can be constructed to facilitate the separation of the assaycomponents from an aptamer biomarker complex (or photoaptamer biomarkercovalent complex) and permit isolation of the aptamer for detectionand/or quantification. In one embodiment, these constructs can include acleavable or releasable element within the aptamer sequence. In otherembodiments, additional functionality can be introduced into theaptamer, for example, a labeled or detectable component, a spacercomponent, or a specific binding tag or immobilization element. Forexample, the aptamer can include a tag connected to the aptamer via acleavable moiety, a label, a spacer component separating the label, andthe cleavable moiety. In one embodiment, a cleavable element is aphotocleavable linker. The photocleavable linker can be attached to abiotin moiety and a spacer section, can include an NHS group forderivatization of amines, and can be used to introduce a biotin group toan aptamer, thereby allowing for the release of the aptamer later in anassay method.

Homogenous assays, done with all assay components in solution, do notrequire separation of sample and reagents prior to the detection ofsignal. These methods are rapid and easy to use. These methods generatesignal based on a molecular capture or binding reagent that reacts withits specific target. For lung cancer, the molecular capture reagentswould be an aptamer or an antibody or the like and the specific targetwould be a lung cancer biomarker of Table 20.

In one embodiment, a method for signal generation takes advantage ofanisotropy signal change due to the interaction of a fluorophore-labeledcapture reagent with its specific biomarker target. When the labeledcapture reacts with its target, the increased molecular weight causesthe rotational motion of the fluorophore attached to the complex tobecome much slower changing the anisotropy value. By monitoring theanisotropy change, binding events may be used to quantitatively measurethe biomarkers in solutions. Other methods include fluorescencepolarization assays, molecular beacon methods, time resolvedfluorescence quenching, chemiluminescence, fluorescence resonance energytransfer, and the like.

An exemplary solution-based aptamer assay that can be used to detect abiomarker value corresponding to a biomarker in a biological sampleincludes the following: (a) preparing a mixture by contacting thebiological sample with an aptamer that includes a first tag and has aspecific affinity for the biomarker, wherein an aptamer affinity complexis formed when the biomarker is present in the sample; (b) exposing themixture to a first solid support including a first capture element, andallowing the first tag to associate with the first capture element; (c)removing any components of the mixture not associated with the firstsolid support; (d) attaching a second tag to the biomarker component ofthe aptamer affinity complex; (e) releasing the aptamer affinity complexfrom the first solid support; (f) exposing the released aptamer affinitycomplex to a second solid support that includes a second capture elementand allowing the second tag to associate with the second captureelement; (g) removing any non-complexed aptamer from the mixture bypartitioning the non-complexed aptamer from the aptamer affinitycomplex; (h) eluting the aptamer from the solid support; and (i)detecting the biomarker by detecting the aptamer component of theaptamer affinity complex.

Determination of Biomarker Values Using Immunoassays

Immunoassay methods are based on the reaction of an antibody to itscorresponding target or analyte and can detect the analyte in a sampledepending on the specific assay format. To improve specificity andsensitivity of an assay method based on immuno-reactivity, monoclonalantibodies are often used because of their specific epitope recognition.Polyclonal antibodies have also been successfully used in variousimmunoassays because of their increased affinity for the target ascompared to monoclonal antibodies. Immunoassays have been designed foruse with a wide range of biological sample matrices. Immunoassay formatshave been designed to provide qualitative, semi-quantitative, andquantitative results.

Quantitative results are generated through the use of a standard curvecreated with known concentrations of the specific analyte to bedetected. The response or signal from an unknown sample is plotted ontothe standard curve, and a quantity or value corresponding to the targetin the unknown sample is established.

Numerous immunoassay formats have been designed. ELISA or EIA can bequantitative for the detection of an analyte. This method relies onattachment of a label to either the analyte or the antibody and thelabel component includes, either directly or indirectly, an enzyme.ELISA tests may be formatted for direct, indirect, competitive, orsandwich detection of the analyte. Other methods rely on labels such as,for example, radioisotopes (I¹²⁵) or fluorescence. Additional techniquesinclude, for example, agglutination, nephelometry, turbidimetry, Westernblot, immunoprecipitation, immunocytochemistry, immunohistochemistry,flow cytometry, Luminex assay, and others (see ImmunoAssay: A PracticalGuide, edited by Brian Law, published by Taylor & Francis, Ltd., 2005edition).

Exemplary assay formats include enzyme-linked immunosorbent assay(ELISA), radioimmunoassay, fluorescent, chemiluminescence, andfluorescence resonance energy transfer (FRET) or time resolved-FRET(TR-FRET) immunoassays. Examples of procedures for detecting biomarkersinclude biomarker immunoprecipitation followed by quantitative methodsthat allow size and peptide level discrimination, such as gelelectrophoresis, capillary electrophoresis, planarelectrochromatography, and the like.

Methods of detecting and/or quantifying a detectable label or signalgenerating material depend on the nature of the label. The products ofreactions catalyzed by appropriate enzymes (where the detectable labelis an enzyme; see above) can be, without limitation, fluorescent,luminescent, or radioactive or they may absorb visible or ultravioletlight. Examples of detectors suitable for detecting such detectablelabels include, without limitation, x-ray film, radioactivity counters,scintillation counters, spectrophotometers, colorimeters, fluorometers,luminometers, and densitometers.

Any of the methods for detection can be performed in any format thatallows for any suitable preparation, processing, and analysis of thereactions. This can be, for example, in multi-well assay plates (e.g.,96 wells or 384 wells) or using any suitable array or microarray. Stocksolutions for various agents can be made manually or robotically, andall subsequent pipetting, diluting, mixing, distribution, washing,incubating, sample readout, data collection and analysis can be donerobotically using commercially available analysis software, robotics,and detection instrumentation capable of detecting a detectable label.

Determination of Biomarker Values Using Gene Expression Profiling

Measuring mRNA in a biological sample may be used as a surrogate fordetection of the level of the corresponding protein in the biologicalsample. Thus, any of the biomarkers or biomarker panels described hereincan also be detected by detecting the appropriate RNA.

mRNA expression levels are measured by reverse transcriptionquantitative polymerase chain reaction (RT-PCR followed with qPCR).RT-PCR is used to create a cDNA from the mRNA. The cDNA may be used in aqPCR assay to produce fluorescence as the DNA amplification processprogresses. By comparison to a standard curve, qPCR can produce anabsolute measurement such as number of copies of mRNA per cell. Northernblots, microarrays, Invader assays, and RT-PCR combined with capillaryelectrophoresis have all been used to measure expression levels of mRNAin a sample (see Gene Expression Profiling: Methods and Protocols,Richard A. Shimkets, editor, Humana Press, 2004).

miRNA molecules are small RNAs that are non-coding but may regulate geneexpression. Any of the methods suited to the measurement of mRNAexpression levels can also be used for the corresponding miRNA. Recentlymany laboratories have investigated the use of miRNAs as biomarkers fordisease. Many diseases involve wide-spread transcriptional regulation,and it is not surprising that miRNAs might find a role as biomarkers.The connection between miRNA concentrations and disease is often evenless clear than the connections between protein levels and disease, yetthe value of miRNA biomarkers might be substantial. Of course, as withany RNA expressed differentially during disease, the problems facing thedevelopment of an in vitro diagnostic product will include therequirement that the miRNAs survive in the diseased cell and are easilyextracted for analysis, or that the miRNAs are released into blood orother matrices where they must survive long enough to be measured.Protein biomarkers have similar requirements, although many potentialprotein biomarkers are secreted intentionally at the site of pathologyand function, during disease, in a paracrine fashion. Many potentialprotein biomarkers are designed to function outside the cells withinwhich those proteins are synthesized.

Detection of Biomarkers Using In Vivo Molecular Imaging Technologies

Any of the described biomarkers (see Table 20) may also be used inmolecular imaging tests. For example, an imaging agent can be coupled toany of the described biomarkers, which can be used to aid in lung cancerdiagnosis, to monitor disease progression/remission or metastasis, tomonitor for disease recurrence, or to monitor response to therapy, amongother uses.

In vivo imaging technologies provide non-invasive methods fordetermining the state of a particular disease in the body of anindividual. For example, entire portions of the body, or even the entirebody, may be viewed as a three dimensional image, thereby providingvaluable information concerning morphology and structures in the body.Such technologies may be combined with the detection of the biomarkersdescribed herein to provide information concerning the cancer status, inparticular the lung cancer status, of an individual.

The use of in vivo molecular imaging technologies is expanding due tovarious advances in technology. These advances include the developmentof new contrast agents or labels, such as radiolabels and/or fluorescentlabels, which can provide strong signals within the body; and thedevelopment of powerful new imaging technology, which can detect andanalyze these signals from outside the body, with sufficient sensitivityand accuracy to provide useful information. The contrast agent can bevisualized in an appropriate imaging system, thereby providing an imageof the portion or portions of the body in which the contrast agent islocated. The contrast agent may be bound to or associated with a capturereagent, such as an aptamer or an antibody, for example, and/or with apeptide or protein, or an oligonucleotide (for example, for thedetection of gene expression), or a complex containing any of these withone or more macromolecules and/or other particulate forms.

The contrast agent may also feature a radioactive atom that is useful inimaging. Suitable radioactive atoms include technetium-99m or iodine-123for scintigraphic studies. Other readily detectable moieties include,for example, spin labels for magnetic resonance imaging (MRI) such as,for example, iodine-123 again, iodine-131, indium-111, fluorine-19,carbon-13, nitrogen-15, oxygen-17, gadolinium, manganese or iron. Suchlabels are well known in the art and could easily be selected by one ofordinary skill in the art.

Standard imaging techniques include but are not limited to magneticresonance imaging, computed tomography scanning, positron emissiontomography (PET), single photon emission computed tomography (SPECT),and the like. For diagnostic in vivo imaging, the type of detectioninstrument available is a major factor in selecting a given contrastagent, such as a given radionuclide and the particular biomarker that itis used to target (protein, mRNA, and the like). The radionuclide chosentypically has a type of decay that is detectable by a given type ofinstrument. Also, when selecting a radionuclide for in vivo diagnosis,its half-life should be long enough to enable detection at the time ofmaximum uptake by the target tissue but short enough that deleteriousradiation of the host is minimized.

Exemplary imaging techniques include but are not limited to PET andSPECT, which are imaging techniques in which a radionuclide issynthetically or locally administered to an individual. The subsequentuptake of the radiotracer is measured over time and used to obtaininformation about the targeted tissue and the biomarker. Because of thehigh-energy (gamma-ray) emissions of the specific isotopes employed andthe sensitivity and sophistication of the instruments used to detectthem, the two-dimensional distribution of radioactivity may be inferredfrom outside of the body.

Commonly used positron-emitting nuclides in PET include, for example,carbon-11, nitrogen-13, oxygen-15, and fluorine-18. Isotopes that decayby electron capture and/or gamma-emission are used in SPECT and include,for example iodine-123 and technetium-99m. An exemplary method forlabeling amino acids with technetium-99m is the reduction ofpertechnetate ion in the presence of a chelating precursor to form thelabile technetium-99m-precursor complex, which, in turn, reacts with themetal binding group of a bifunctionally modified chemotactic peptide toform a technetium-99m-chemotactic peptide conjugate.

Antibodies are frequently used for such in vivo imaging diagnosticmethods. The preparation and use of antibodies for in vivo diagnosis iswell known in the art. Labeled antibodies which specifically bind any ofthe biomarkers in Table 20 can be injected into an individual suspectedof having a certain type of cancer (e.g., lung cancer), detectableaccording to the particular biomarker used, for the purpose ofdiagnosing or evaluating the disease status of the individual. The labelused will be selected in accordance with the imaging modality to beused, as previously described. Localization of the label permitsdetermination of the spread of the cancer. The amount of label within anorgan or tissue also allows determination of the presence or absence ofcancer in that organ or tissue.

Similarly, aptamers may be used for such in vivo imaging diagnosticmethods. For example, an aptamer that was used to identify a particularbiomarker described in Table 20 (and therefore binds specifically tothat particular biomarker) may be appropriately labeled and injectedinto an individual suspected of having lung cancer, detectable accordingto the particular biomarker, for the purpose of diagnosing or evaluatingthe lung cancer status of the individual. The label used will beselected in accordance with the imaging modality to be used, aspreviously described. Localization of the label permits determination ofthe spread of the cancer. The amount of label within an organ or tissuealso allows determination of the presence or absence of cancer in thatorgan or tissue. Aptamer-directed imaging agents could have unique andadvantageous characteristics relating to tissue penetration, tissuedistribution, kinetics, elimination, potency, and selectivity ascompared to other imaging agents.

Such techniques may also optionally be performed with labeledoligonucleotides, for example, for detection of gene expression throughimaging with antisense oligonucleotides. These methods are used for insitu hybridization, for example, with fluorescent molecules orradionuclides as the label. Other methods for detection of geneexpression include, for example, detection of the activity of a reportergene.

Another general type of imaging technology is optical imaging, in whichfluorescent signals within the subject are detected by an optical devicethat is external to the subject. These signals may be due to actualfluorescence and/or to bioluminescence. Improvements in the sensitivityof optical detection devices have increased the usefulness of opticalimaging for in vivo diagnostic assays.

The use of in vivo molecular biomarker imaging is increasing, includingfor clinical trials, for example, to more rapidly measure clinicalefficacy in trials for new cancer therapies and/or to avoid prolongedtreatment with a placebo for those diseases, such as multiple sclerosis,in which such prolonged treatment may be considered to be ethicallyquestionable.

For a review of other techniques, see N. Blow, Nature Methods, 6,465-469, 2009.

Determination of Biomarker Values Using Histology/Cytology Methods

For evaluation of lung cancer, a variety of tissue samples may be usedin histological or cytological methods. Sample selection depends on theprimary tumor location and sites of metastases. For example, endo- andtrans-bronchial biopsies, fine needle aspirates, cutting needles, andcore biopsies can be used for histology. Bronchial washing and brushing,pleural aspiration, and sputum, can be used for cytology. Whilecytological analysis is still used in the diagnosis of lung cancer,histological methods are known to provide better sensitivity for thedetection of cancer. Any of the biomarkers identified herein that wereshown to be up-regulated (see Table 19) in the individuals with lungcancer can be used to stain a histological specimen as an indication ofdisease.

In one embodiment, one or more capture reagent(s) specific to thecorresponding biomarker(s) are used in a cytological evaluation of alung cell sample and may include one or more of the following:collecting a cell sample, fixing the cell sample, dehydrating, clearing,immobilizing the cell sample on a microscope slide, permeabilizing thecell sample, treating for analyte retrieval, staining, destaining,washing, blocking, and reacting with one or more capture reagent/s in abuffered solution. In another embodiment, the cell sample is producedfrom a cell block.

In another embodiment, one or more capture reagent/s specific to thecorresponding biomarkers are used in a histological evaluation of a lungtissue sample and may include one or more of the following: collecting atissue specimen, fixing the tissue sample, dehydrating, clearing,immobilizing the tissue sample on a microscope slide, permeabilizing thetissue sample, treating for analyte retrieval, staining, destaining,washing, blocking, rehydrating, and reacting with capture reagent/s in abuffered solution. In another embodiment, fixing and dehydrating arereplaced with freezing.

In another embodiment, the one or more aptamer/s specific to thecorresponding biomarker/s are reacted with the histological orcytological sample and can serve as the nucleic acid target in a nucleicacid amplification method. Suitable nucleic acid amplification methodsinclude, for example, PCR, q-beta replicase, rolling circleamplification, strand displacement, helicase dependent amplification,loop mediated isothermal amplification, ligase chain reaction, andrestriction and circularization aided rolling circle amplification.

In one embodiment, the one or more capture reagent(s) specific to thecorresponding biomarkers for use in the histological or cytologicalevaluation are mixed in a buffered solution that can include any of thefollowing: blocking materials, competitors, detergents, stabilizers,carrier nucleic acid, polyanionic materials, etc.

A “cytology protocol” generally includes sample collection, samplefixation, sample immobilization, and staining. “Cell preparation” caninclude several processing steps after sample collection, including theuse of one or more slow off-rate aptamers for the staining of theprepared cells.

Sample collection can include directly placing the sample in anuntreated transport container, placing the sample in a transportcontainer containing some type of media, or placing the sample directlyonto a slide (immobilization) without any treatment or fixation.

Sample immobilization can be improved by applying a portion of thecollected specimen to a glass slide that is treated with polylysine,gelatin, or a silane. Slides can be prepared by smearing a thin and evenlayer of cells across the slide. Care is generally taken to minimizemechanical distortion and drying artifacts. Liquid specimens can beprocessed in a cell block method. Or, alternatively, liquid specimenscan be mixed 1:1 with the fixative solution for about 10 minutes at roomtemperature.

Cell blocks can be prepared from residual effusions, sputum, urinesediments, gastrointestinal fluids, cell scraping, or fine needleaspirates. Cells are concentrated or packed by centrifugation ormembrane filtration. A number of methods for cell block preparation havebeen developed. Representative procedures include the fixed sediment,bacterial agar, or membrane filtration methods. In the fixed sedimentmethod, the cell sediment is mixed with a fixative like Bouins, picricacid, or buffered formalin and then the mixture is centrifuged to pelletthe fixed cells. The supernatant is removed, drying the cell pellet ascompletely as possible. The pellet is collected and wrapped in lenspaper and then placed in a tissue cassette. The tissue cassette isplaced in a jar with additional fixative and processed as a tissuesample. Agar method is very similar but the pellet is removed and driedon paper towel and then cut in half. The cut side is placed in a drop ofmelted agar on a glass slide and then the pellet is covered with agarmaking sure that no bubbles form in the agar. The agar is allowed toharden and then any excess agar is trimmed away. This is placed in atissue cassette and the tissue process completed. Alternatively, thepellet may be directly suspended in 2% liquid agar at 65° C. and thesample centrifuged. The agar cell pellet is allowed to solidify for anhour at 4° C. The solid agar may be removed from the centrifuge tube andsliced in half. The agar is wrapped in filter paper and then the tissuecassette. Processing from this point forward is as described above.Centrifugation can be replaced in any these procedures with membranefiltration. Any of these processes may be used to generate a “cell blocksample”.

Cell blocks can be prepared using specialized resin including Lowicrylresins, LR White, LR Gold, Unicryl, and MonoStep. These resins have lowviscosity and can be polymerized at low temperatures and with ultraviolet (UV) light. The embedding process relies on progressively coolingthe sample during dehydration, transferring the sample to the resin, andpolymerizing a block at the final low temperature at the appropriate UVwavelength.

Cell block sections can be stained with hematoxylin-eosin forcytomorphological examination while additional sections are used forexamination for specific markers.

Whether the process is cytological or histological, the sample may befixed prior to additional processing to prevent sample degradation. Thisprocess is called “fixation” and describes a wide range of materials andprocedures that may be used interchangeably. The sample fixationprotocol and reagents are best selected empirically based on the targetsto be detected and the specific cell/tissue type to be analyzed. Samplefixation relies on reagents such as ethanol, polyethylene glycol,methanol, formalin, or isopropanol. The samples should be fixed as soonafter collection and affixation to the slide as possible. However, thefixative selected can introduce structural changes into variousmolecular targets making their subsequent detection more difficult. Thefixation and immobilization processes and their sequence can modify theappearance of the cell and these changes must be anticipated andrecognized by the cytotechnologist. Fixatives can cause shrinkage ofcertain cell types and cause the cytoplasm to appear granular orreticular. Many fixatives function by crosslinking cellular components.This can damage or modify specific epitopes, generate new epitopes,cause molecular associations, and reduce membrane permeability. Formalinfixation is one of the most common cytological/histological approaches.Formalin forms methyl bridges between neighboring proteins or withinproteins. Precipitation or coagulation is also used for fixation andethanol is frequently used in this type of fixation. A combination ofcros slinking and precipitation can also be used for fixation. A strongfixation process is best at preserving morphological information while aweaker fixation process is best for the preservation of moleculartargets.

A representative fixative is 50% absolute ethanol, 2 mM polyethyleneglycol (PEG), 1.85% formaldehyde. Variations on this formulation includeethanol (50% to 95%), methanol (20%-50%), and formalin (formaldehyde)only. Another common fixative is 2% PEG 1500, 50% ethanol, and 3%methanol. Slides are place in the fixative for about 10 to 15 minutes atroom temperature and then removed and allowed to dry. Once slides arefixed they can be rinsed with a buffered solution like PBS.

A wide range of dyes can be used to differentially highlight andcontrast or “stain” cellular, sub-cellular, and tissue features ormorphological structures. Hematoylin is used to stain nuclei a blue orblack color. Orange G-6 and Eosin Azure both stain the cell's cytoplasm.Orange G stains keratin and glycogen containing cells yellow. Eosin Y isused to stain nucleoli, cilia, red blood cells, and superficialepithelial squamous cells. Romanowsky stains are used for air driedslides and are useful in enhancing pleomorphism and distinguishingextracellular from intracytoplasmic material.

The staining process can include a treatment to increase thepermeability of the cells to the stain. Treatment of the cells with adetergent can be used to increase permeability. To increase cell andtissue permeability, fixed samples can be further treated with solvents,saponins, or non-ionic detergents. Enzymatic digestion can also improvethe accessibility of specific targets in a tissue sample.

After staining, the sample is dehydrated using a succession of alcoholrinses with increasing alcohol concentration. The final wash is donewith xylene or a xylene substitute, such as a citrus terpene, that has arefractive index close to that of the coverslip to be applied to theslide. This final step is referred to as clearing. Once the sample isdehydrated and cleared, a mounting medium is applied. The mountingmedium is selected to have a refractive index close to the glass and iscapable of bonding the coverslip to the slide. It will also inhibit theadditional drying, shrinking, or fading of the cell sample.

Regardless of the stains or processing used, the final evaluation of thelung cytological specimen is made by some type of microscopy to permit avisual inspection of the morphology and a determination of the marker'spresence or absence. Exemplary microscopic methods include brightfield,phase contrast, fluorescence, and differential interference contrast.

If secondary tests are required on the sample after examination, thecoverslip may be removed and the slide destained. Destaining involvesusing the original solvent systems used in staining the slide originallywithout the added dye and in a reverse order to the original stainingprocedure. Destaining may also be completed by soaking the slide in anacid alcohol until the cells are colorless. Once colorless the slidesare rinsed well in a water bath and the second staining procedureapplied.

In addition, specific molecular differentiation may be possible inconjunction with the cellular morphological analysis through the use ofspecific molecular reagents such as antibodies or nucleic acid probes oraptamers. This improves the accuracy of diagnostic cytology.Micro-dissection can be used to isolate a subset of cells for additionalevaluation, in particular, for genetic evaluation of abnormalchromosomes, gene expression, or mutations.

Preparation of a tissue sample for histological evaluation involvesfixation, dehydration, infiltration, embedding, and sectioning. Thefixation reagents used in histology are very similar or identical tothose used in cytology and have the same issues of preservingmorphological features at the expense of molecular ones such asindividual proteins. Time can be saved if the tissue sample is not fixedand dehydrated but instead is frozen and then sectioned while frozen.This is a more gentle processing procedure and can preserve moreindividual markers. However, freezing is not acceptable for long termstorage of a tissue sample as subcellular information is lost due to theintroduction of ice crystals. Ice in the frozen tissue sample alsoprevents the sectioning process from producing a very thin slice andthus some microscopic resolution and imaging of subcellular structurescan be lost. In addition to formalin fixation, osmium tetroxide is usedto fix and stain phospholipids (membranes).

Dehydration of tissues is accomplished with successive washes ofincreasing alcohol concentration. Clearing employs a material that ismiscible with alcohol and the embedding material and involves a stepwiseprocess starting at 50:50 alcohol:clearing reagent and then 100%clearing agent (xylene or xylene substitute). Infiltration involvesincubating the tissue with a liquid form of the embedding agent (warmwax, nitrocellulose solution) first at 50:50 embedding agent: clearingagent and the 100% embedding agent. Embedding is completed by placingthe tissue in a mold or cassette and filling with melted embedding agentsuch as wax, agar, or gelatin. The embedding agent is allowed to harden.The hardened tissue sample may then be sliced into thin section forstaining and subsequent examination.

Prior to staining, the tissue section is dewaxed and rehydrated. Xyleneis used to dewax the section, one or more changes of xylene may be used,and the tissue is rehydrated by successive washes in alcohol ofdecreasing concentration. Prior to dewax, the tissue section may be heatimmobilized to a glass slide at about 80° C. for about 20 minutes.

Laser capture micro-dissection allows the isolation of a subset of cellsfor further analysis from a tissue section.

As in cytology, to enhance the visualization of the microscopicfeatures, the tissue section or slice can be stained with a variety ofstains. A large menu of commercially available stains can be used toenhance or identify specific features.

To further increase the interaction of molecular reagents withcytological/histological samples, a number of techniques for “analyteretrieval” have been developed. The first such technique uses hightemperature heating of a fixed sample. This method is also referred toas heat-induced epitope retrieval or HIER. A variety of heatingtechniques have been used, including steam heating, microwaving,autoclaving, water baths, and pressure cooking or a combination of thesemethods of heating. Analyte retrieval solutions include, for example,water, citrate, and normal saline buffers. The key to analyte retrievalis the time at high temperature but lower temperatures for longer timeshave also been successfully used. Another key to analyte retrieval isthe pH of the heating solution. Low pH has been found to provide thebest immunostaining but also gives rise to backgrounds that frequentlyrequire the use of a second tissue section as a negative control. Themost consistent benefit (increased immunostaining without increase inbackground) is generally obtained with a high pH solution regardless ofthe buffer composition. The analyte retrieval process for a specifictarget is empirically optimized for the target using heat, time, pH, andbuffer composition as variables for process optimization. Using themicrowave analyte retrieval method allows for sequential staining ofdifferent targets with antibody reagents. But the time required toachieve antibody and enzyme complexes between staining steps has alsobeen shown to degrade cell membrane analytes. Microwave heating methodshave improved in situ hybridization methods as well.

To initiate the analyte retrieval process, the section is first dewaxedand hydrated. The slide is then placed in 10 mM sodium citrate buffer pH6.0 in a dish or jar. A representative procedure uses an 1100 Wmicrowave and microwaves the slide at 100% power for 2 minutes followedby microwaving the slides using 20% power for 18 minutes after checkingto be sure the slide remains covered in liquid. The slide is thenallowed to cool in the uncovered container and then rinsed withdistilled water. HIER may be used in combination with an enzymaticdigestion to improve the reactivity of the target to immunochemicalreagents.

One such enzymatic digestion protocol uses proteinase K. A 20 μg/mlconcentration of proteinase K is prepared in 50 mM Tris Base, 1 mM EDTA,0.5% Triton X-100, pH 8.0 buffer. The process first involves dewaxingsections in 2 changes of xylene, 5 minutes each. Then the sample ishydrated in 2 changes of 100% ethanol for 3 minutes each, 95% and 80%ethanol for 1 minute each, and then rinsed in distilled water. Sectionsare covered with Proteinase K working solution and incubated 10-20minutes at 37° C. in humidified chamber (optimal incubation time mayvary depending on tissue type and degree of fixation). The sections arecooled at room temperature for 10 minutes and then rinsed in PBS Tween20 for 2×2 min. If desired, sections can be blocked to eliminatepotential interference from endogenous compounds and enzymes. Thesection is then incubated with primary antibody at appropriate dilutionin primary antibody dilution buffer for 1 hour at room temperature orovernight at 4° C. The section is then rinsed with PBS Tween 20 for 2×2min. Additional blocking can be performed, if required for the specificapplication, followed by additional rinsing with PBS Tween 20 for 3×2min and then finally the immunostaining protocol completed.

A simple treatment with 1% SDS at room temperature has also beendemonstrated to improve immunohistochemical staining. Analyte retrievalmethods have been applied to slide mounted sections as well as freefloating sections. Another treatment option is to place the slide in ajar containing citric acid and 0.1 Nonident P40 at pH 6.0 and heating to95° C. The slide is then washed with a buffer solution like PBS.

For immunological staining of tissues it may be useful to blocknon-specific association of the antibody with tissue proteins by soakingthe section in a protein solution like serum or non-fat dry milk.

Blocking reactions may include the need to reduce the level ofendogenous biotin; eliminate endogenous charge effects; inactivateendogenous nucleases; and/or inactivate endogenous enzymes likeperoxidase and alkaline phosphatase. Endogenous nucleases may beinactivated by degradation with proteinase K, by heat treatment, use ofa chelating agent such as EDTA or EGTA, the introduction of carrier DNAor RNA, treatment with a chaotrope such as urea, thiourea, guanidinehydrochloride, guanidine thiocyanate, lithium perchlorate, etc, ordiethyl pyrocarbonate. Alkaline phosphatase may be inactivated bytreated with 0.1N HCl for 5 minutes at room temperature or treatmentwith 1 mM levamisole. Peroxidase activity may be eliminated by treatmentwith 0.03% hydrogen peroxide. Endogenous biotin may be blocked bysoaking the slide or section in an avidin (streptavidin, neutravidin maybe substituted) solution for at least 15 minutes at room temperature.The slide or section is then washed for at least 10 minutes in buffer.This may be repeated at least three times. Then the slide or section issoaked in a biotin solution for 10 minutes. This may be repeated atleast three times with a fresh biotin solution each time. The bufferwash procedure is repeated. Blocking protocols should be minimized toprevent damaging either the cell or tissue structure or the target ortargets of interest but one or more of these protocols could be combinedto “block” a slide or section prior to reaction with one or more slowoff-rate aptamers. See Basic Medical Histology: the Biology of Cells,Tissues and Organs, authored by Richard G. Kessel, Oxford UniversityPress, 1998.

Determination of Biomarker Values Using Mass Spectrometry Methods

A variety of configurations of mass spectrometers can be used to detectbiomarker values. Several types of mass spectrometers are available orcan be produced with various configurations. In general, a massspectrometer has the following major components: a sample inlet, an ionsource, a mass analyzer, a detector, a vacuum system, andinstrument-control system, and a data system. Difference in the sampleinlet, ion source, and mass analyzer generally define the type ofinstrument and its capabilities. For example, an inlet can be acapillary-column liquid chromatography source or can be a direct probeor stage such as used in matrix-assisted laser desorption. Common ionsources are, for example, electrospray, including nanospray andmicrospray or matrix-assisted laser desorption. Common mass analyzersinclude a quadrupole mass filter, ion trap mass analyzer andtime-of-flight mass analyzer. Additional mass spectrometry methods arewell known in the art (see Burlingame et al. Anal. Chem. 70:647 R-716R(1998); Kinter and Sherman. Protein sequencing and identification usingtandem mass spectrometry. New York: Wiley-Interscience (2000).

Protein biomarkers and biomarker values can be detected and measured byany of the following: electrospray ionization mass spectrometry(ESI-MS), ESI-MS/MS, ESI-MS/(MS)n, matrix-assisted laser desorptionionization time-of-flight mass spectrometry (MALDI-TOF-MS),surface-enhanced laser desorption/ionization time-of-flight massspectrometry (SELDI-TOF-MS), desorption/ionization on silicon (DIOS),secondary ion mass spectrometry (SIMS), quadrupole time-of-flight(Q-TOF), tandem time-of-flight (TOF/TOF) technology, called ultraflexIII TOF/TOF, atmospheric pressure chemical ionization mass spectrometry(APCI-MS), APCI-MS/MS, APCI-(MS)^(N), atmospheric pressurephotoionization mass spectrometry (APPI-MS), APPI-MS/MS, andAPPI-(MS)^(N), quadrupole mass spectrometry, Fourier transform massspectrometry (FTMS), quantitative mass spectrometry, and ion trap massspectrometry.

Sample preparation strategies are used to label and enrich samplesbefore mass spectroscopic characterization of protein biomarkers anddetermination biomarker values. Labeling methods include but are notlimited to isobaric tag for relative and absolute quantitation (iTRAQ)and stable isotope labeling with amino acids in cell culture (SILAC).Capture reagents used to selectively enrich samples for candidatebiomarker proteins prior to mass spectroscopic analysis include but arenot limited to aptamers, antibodies, nucleic acid probes, chimeras,small molecules, an F(ab′)₂ fragment, a single chain antibody fragment,an Fv fragment, a single chain Fv fragment, a nucleic acid, a lectin, aligand-binding receptor, affybodies, nanobodies, ankyrins, domainantibodies, alternative antibody scaffolds (e.g. diabodies etc)imprinted polymers, avimers, peptidomimetics, peptoids, peptide nucleicacids, threose nucleic acid, a hormone receptor, a cytokine receptor,and synthetic receptors, and modifications and fragments of these.

The foregoing assays enable the detection of biomarker values that areuseful in methods for diagnosing lung cancer, where the methods comprisedetecting, in a biological sample from an individual, at least Nbiomarker values that each correspond to a biomarker selected from thegroup consisting of the biomarkers provided in Tables 18, 20 or 21,wherein a classification, as described in detail below, using thebiomarker values indicates whether the individual has lung cancer. Whilecertain of the described lung cancer biomarkers are useful alone fordetecting and diagnosing lung cancer, methods are also described hereinfor the grouping of multiple subsets of the lung cancer biomarkers thatare each useful as a panel of three or more biomarkers. Thus, variousembodiments of the instant application provide combinations comprising Nbiomarkers, wherein N is at least three biomarkers. In otherembodiments, N is selected to be any number from 2-86 biomarkers. Itwill be appreciated that N can be selected to be any number from any ofthe above described ranges, as well as similar, but higher order,ranges. In accordance with any of the methods described herein,biomarker values can be detected and classified individually or they canbe detected and classified collectively, as for example in a multiplexassay format.

In another aspect, methods are provided for detecting an absence of lungcancer, the methods comprising detecting, in a biological sample from anindividual, at least N biomarker values that each correspond to abiomarker selected from the group consisting of the biomarkers providedin Tables 18, 20 or 21, wherein a classification, as described in detailbelow, of the biomarker values indicates an absence of lung cancer inthe individual. While certain of the described lung cancer biomarkersare useful alone for detecting and diagnosing the absence of lungcancer, methods are also described herein for the grouping of multiplesubsets of the lung cancer biomarkers that are each useful as a panel ofthree or more biomarkers. Thus, various embodiments of the instantapplication provide combinations comprising N biomarkers, wherein N isat least three biomarkers. In other embodiments, N is selected to be anynumber from 2-86 biomarkers. It will be appreciated that N can beselected to be any number from any of the above described ranges, aswell as similar, but higher order, ranges. In accordance with any of themethods described herein, biomarker values can be detected andclassified individually or they can be detected and classifiedcollectively, as for example in a multiplex assay format.

Classification of Biomarkers and Calculation of Disease Scores

A biomarker “signature” for a given diagnostic test contains a set ofmarkers, each marker having different levels in the populations ofinterest. Different levels, in this context, may refer to differentmeans of the marker levels for the individuals in two or more groups, ordifferent variances in the two or more groups, or a combination of both.For the simplest form of a diagnostic test, these markers can be used toassign an unknown sample from an individual into one of two groups,either diseased or not diseased. The assignment of a sample into one oftwo or more groups is known as classification, and the procedure used toaccomplish this assignment is known as a classifier or a classificationmethod. Classification methods may also be referred to as scoringmethods. There are many classification methods that can be used toconstruct a diagnostic classifier from a set of biomarker values. Ingeneral, classification methods are most easily performed usingsupervised learning techniques where a data set is collected usingsamples obtained from individuals within two (or more, for multipleclassification states) distinct groups one wishes to distinguish. Sincethe class (group or population) to which each sample belongs is known inadvance for each sample, the classification method can be trained togive the desired classification response. It is also possible to useunsupervised learning techniques to produce a diagnostic classifier.

Common approaches for developing diagnostic classifiers include decisiontrees; bagging+boosting+forests; rule inference based learning; ParzenWindows; linear models; logistic; neural network methods; unsupervisedclustering; K-means; hierarchical ascending/descending; semi-supervisedlearning; prototype methods; nearest neighbor; kernel densityestimation; support vector machines; hidden Markov models; BoltzmannLearning; and classifiers may be combined either simply or in ways whichminimize particular objective functions. For a review, see, e.g.,Pattern Classification, R. O. Duda, et al., editors, John Wiley & Sons,2nd edition, 2001; see also, The Elements of Statistical Learning—DataMining, Inference, and Prediction, T. Hastie, et al., editors, SpringerScience+Business Media, LLC, 2nd edition, 2009; each of which isincorporated by reference in its entirety.

To produce a classifier using supervised learning techniques, a set ofsamples called training data are obtained. In the context of diagnostictests, training data includes samples from the distinct groups (classes)to which unknown samples will later be assigned. For example, samplescollected from individuals in a control population and individuals in aparticular disease population can constitute training data to develop aclassifier that can classify unknown samples (or, more particularly, theindividuals from whom the samples were obtained) as either having thedisease or being free from the disease. The development of theclassifier from the training data is known as training the classifier.Specific details on classifier training depend on the nature of thesupervised learning technique. For purposes of illustration, an exampleof training a naïve Bayesian classifier will be described below (see,e.g., Pattern Classification, R. O. Duda, et al., editors, John Wiley &Sons, 2nd edition, 2001; see also, The Elements of StatisticalLearning—Data Mining, Inference, and Prediction, T. Hastie, et al.,editors, Springer Science+Business Media, LLC, 2nd edition, 2009).

Since typically there are many more potential biomarker values thansamples in a training set, care must be used to avoid over-fitting.Over-fitting occurs when a statistical model describes random error ornoise instead of the underlying relationship. Over-fitting can beavoided in a variety of way, including, for example, by limiting thenumber of markers used in developing the classifier, by assuming thatthe marker responses are independent of one another, by limiting thecomplexity of the underlying statistical model employed, and by ensuringthat the underlying statistical model conforms to the data.

An illustrative example of the development of a diagnostic test using aset of biomarkers includes the application of a naïve Bayes classifier,a simple probabilistic classifier based on Bayes theorem with strictindependent treatment of the biomarkers. Each biomarker is described bya class-dependent probability density function (pdf) for the measuredRFU values or log RFU (relative fluorescence units) values in eachclass. The joint pdfs for the set of markers in one class is assumed tobe the product of the individual class-dependent pdfs for eachbiomarker. Training a naïve Bayes classifier in this context amounts toassigning parameters (“parameterization”) to characterize the classdependent pdfs. Any underlying model for the class-dependent pdfs may beused, but the model should generally conform to the data observed in thetraining set.

Specifically, the class-dependent probability of measuring a value x_(i)for biomarker i in the disease class is written as p(x_(i)|d) and theoverall naïve Bayes probability of observing n markers with values{tilde under (x)}=(x₁, x₂, . . . x_(n)) is written as

${p\left( \underset{\sim}{x} \middle| d \right)} = {\prod\limits_{i = 1}^{n}{p\left( x_{i} \middle| d \right)}}$

where the individual x_(i)s are the measured biomarker levels in RFU orlog RFU. The classification assignment for an unknown is facilitated bycalculating the probability of being diseased p(d|{tilde under (x)})having measured {tilde under (x)} compared to the probability of beingdisease free (control) p(c″{tilde under (x)}) for the same measuredvalues. The ratio of these probabilities is computed from theclass-dependent pdfs by application of Bayes theorem, i.e.,

$\frac{p\left( c \middle| \underset{\sim}{x} \right)}{p\left( d \middle| \underset{\sim}{x} \right)} = \frac{{p\left( \underset{\sim}{x} \middle| c \right)}\left( {1 - {P(d)}} \right)}{{p\left( \underset{\sim}{x} \middle| d \right)}{P(d)}}$

where P(d) is the prevalence of the disease in the populationappropriate to the test. Taking the logarithm of both sides of thisratio and substituting the naïve Bayes class-dependent probabilitiesfrom above gives ln

$\frac{p\left( c \middle| \underset{\sim}{x} \right)}{p\left( d \middle| \underset{\sim}{x} \right)} = {{\sum\limits_{i = 1}^{n}{\ln \; \frac{p\left( x_{i} \middle| c \right)}{p\left( x_{i} \middle| d \right)}}} + {\ln \; {\frac{\left( {1 - {P(d)}} \right)}{P(d)}.}}}$

This form is known as the log likelihood ratio and simply states thatthe log likelihood of being free of the particular disease versus havingthe disease and is primarily composed of the sum of individual loglikelihood ratios of the n individual biomarkers. In its simplest form,an unknown sample (or, more particularly, the individual from whom thesample was obtained) is classified as being free of the disease if theabove ratio is greater than zero and having the disease if the ratio isless than zero.

In one exemplary embodiment, the class-dependent biomarker pdfsp(x_(i)|c) and p(x_(i)|d) are assumed to be normal or log-normaldistributions in the measured RFU values x_(i), i.e.

${p\left( x_{i} \middle| c \right)} = {\frac{1}{\sqrt{2\pi}\sigma_{c,i}}^{- \frac{{({x_{i} - \mu_{c,i}})}^{2}}{2\sigma_{c,i}^{2}}}}$

with a similar expression for p(x_(i)|d) with μ_(d,i) and σ_(d,i) ².Parameterization of the model requires estimation of two parameters foreach class-dependent pdf, a mean μ and a variance σ², from the trainingdata. This may be accomplished in a number of ways, including, forexample, by maximum likelihood estimates, by least-squares, and by anyother methods known to one skilled in the art. Substituting the normaldistributions for p(x_(i)|c) and p(x_(i)|d) into the log-likelihoodratio defined above gives the following expression:

${\ln \; \frac{p\left( c \middle| \underset{\sim}{x} \right)}{p\left( d \middle| \underset{\sim}{x} \right)}} = {{\sum\limits_{i = 1}^{n}{\ln \; \frac{\sigma_{d,i}}{\sigma_{c,i}}}} - {\frac{1}{2}{\sum\limits_{i = 1}^{n}\left\lbrack {\left( \frac{x_{i} - \mu_{c,i}}{\sigma_{c,i}} \right)^{2} - \left( \frac{x_{i} - \mu_{d,i}}{\sigma_{d,i}} \right)^{2}} \right\rbrack}} + {\ln \; {\frac{\left( {1 - {P(d)}} \right)}{P(d)}.}}}$

Once a set of μs and σ²s have been defined for each pdf in each classfrom the training data and the disease prevalence in the population isspecified, the Bayes classifier is fully determined and may be used toclassify unknown samples with measured values {tilde under (x)}.

The performance of the naïve Bayes classifier is dependent upon thenumber and quality of the biomarkers used to construct and train theclassifier. A single biomarker will perform in accordance with itsKS-distance (Kolmogorov-Smirnov), as defined in Example 3, below. If aclassifier performance metric is defined as the sum of the sensitivity(fraction of true positives, f_(TP)) and specificity (one minus thefraction of false positives, 1−f_(FP)), a perfect classifier will have ascore of two and a random classifier, on average, will have a score ofone. Using the definition of the KS-distance, that value x* whichmaximizes the difference in the cdf functions can be found by solving

$\frac{\partial{KS}}{\partial x} = {\frac{\partial\left( {{{cdf}_{c}(x)} - {{cdf}_{d}(x)}} \right)}{\partial x} = 0}$

for x which leads to p(x*|c)=p(x*|d), i.e, the KS distance occurs wherethe class-dependent pdfs cross. Substituting this value of x* into theexpression for the KS-distance yields the following definition for

KS = cdf_(c)(x^(*)) − cdf_(d)(x^(*)) = ∫_(−∞)^(x^(*))p(x|c)x − ∫_(−∞)^(x^(*))p(x|d)x = 1 − ∫_(x^(*))^(∞)p(x|c)x − ∫_(−∞)^(x^(*))p(x|d)x = 1 − f_(FP) − f_(FN),

the KS distance is one minus the total fraction of errors using a testwith a cut-off at x*, essentially a single analyte Bayesian classifier.Since we define a score of sensitivity+specificity=2−f_(FP)−f_(FN),combining the above definition of the KS-distance we see thatsensitivity+specificity=1+KS. We select biomarkers with a statistic thatis inherently suited for building naïve Bayes classifiers.

The addition of subsequent markers with good KS distances (>0.3, forexample) will, in general, improve the classification performance if thesubsequently added markers are independent of the first marker. Usingthe sensitivity plus specificity as a classifier score, it isstraightforward to generate many high scoring classifiers with avariation of a greedy algorithm. (A greedy algorithm is any algorithmthat follows the problem solving metaheuristic of making the locallyoptimal choice at each stage with the hope of finding the globaloptimum.)

The algorithm approach used here is described in detail in Example 4.Briefly, all single analyte classifiers are generated from a table ofpotential biomarkers and added to a list. Next, all possible additionsof a second analyte to each of the stored single analyte classifiers isthen performed, saving a predetermined number of the best scoring pairs,say, for example, a thousand, on a new list. All possible three markerclassifiers are explored using this new list of the best two-markerclassifiers, again saving the best thousand of these. This processcontinues until the score either plateaus or begins to deteriorate asadditional markers are added. Those high scoring classifiers that remainafter convergence can be evaluated for the desired performance for anintended use. For example, in one diagnostic application, classifierswith a high sensitivity and modest specificity may be more desirablethan modest sensitivity and high specificity. In another diagnosticapplication, classifiers with a high specificity and a modestsensitivity may be more desirable. The desired level of performance isgenerally selected based upon a trade-off that must be made between thenumber of false positives and false negatives that can each be toleratedfor the particular diagnostic application. Such trade-offs generallydepend on the medical consequences of an error, either false positive orfalse negative.

Various other techniques are known in the art and may be employed togenerate many potential classifiers from a list of biomarkers using anaïve Bayes classifier. In one embodiment, what is referred to as agenetic algorithm can be used to combine different markers using thefitness score as defined above. Genetic algorithms are particularly wellsuited to exploring a large diverse population of potential classifiers.In another embodiment, so-called ant colony optimization can be used togenerate sets of classifiers. Other strategies that are known in the artcan also be employed, including, for example, other evolutionarystrategies as well as simulated annealing and other stochastic searchmethods. Metaheuristic methods, such as, for example, harmony search mayalso be employed.

Exemplary embodiments use any number of the lung cancer biomarkerslisted in Tables 18, 20 or 21 in various combinations to producediagnostic tests for detecting lung cancer (see Examples 2 and 6 for adetailed description of how these biomarkers were identified). In oneembodiment, a method for diagnosing lung cancer uses a naïve Bayesclassification method in conjunction with any number of the lung cancerbiomarkers listed in Tables 18, 20 or 21. In an illustrative example(Example 3), the simplest test for detecting lung cancer from apopulation of asymptomatic smokers can be constructed using a singlebiomarker, for example, SCFsR which is down-regulated in lung cancerwith a KS-distance of 0.37 (1+KS=1.37). Using the parameters μ_(c,i),σ_(c,i), μ_(d,i) and σ_(d,i) for SCFsR from Table 15 and the equationfor the log-likelihood described above, a diagnostic test with asensitivity of 63% and specificity of 73% (sensitivity+specificity=1.36)can be produced, see Table 14. The ROC curve for this test is displayedin FIG. 2 and has an AUC of 0.75.

Addition of biomarker HSP90a, for example, with a KS-distance of 0.5,significantly improves the classifier performance to a sensitivity of76% and specificity of 0.75% (sensitivity+specificity=1.51) and anAUC=0.84. Note that the score for a classifier constructed of twobiomarkers is not a simple sum of the KS-distances; KS-distances are notadditive when combining biomarkers and it takes many more weak markersto achieve the same level of performance as a strong marker. Adding athird marker, ERBB1, for example, boosts the classifier performance to78% sensitivity and 83% specificity and AUC=0.87. Adding additionalbiomarkers, such as, for example, PTN, BTK, CD30, Kallikrein 7, LRIG3,LDH-H1, and PARC, produces a series of lung cancer tests summarized inTable 14 and displayed as a series of ROC curves in FIG. 3. The score ofthe classifiers as a function of the number of analytes used inclassifier construction is displayed in FIG. 4. The sensitivity andspecificity of this exemplary ten-marker classifier is >87% and the AUCis 0.91.

The markers listed in Tables 18, 20 or 21 can be combined in many waysto produce classifiers for diagnosing lung cancer. In some embodiments,panels of biomarkers are comprised of different numbers of analytesdepending on a specific diagnostic performance criterion that isselected. For example, certain combinations of biomarkers will producetests that are more sensitive (or more specific) than othercombinations.

Once a panel is defined to include a particular set of biomarkers fromTables 18, 20 or 21 and a classifier is constructed from a set oftraining data, the definition of the diagnostic test is complete. In oneembodiment, the procedure used to classify an unknown sample is outlinedin FIG. 1A. In another embodiment the procedure used to classify anunknown sample is outlined in FIG. 1B. The biological sample isappropriately diluted and then run in one or more assays to produce therelevant quantitative biomarker levels used for classification. Themeasured biomarker levels are used as input for the classificationmethod that outputs a classification and an optional score for thesample that reflects the confidence of the class assignment.

Table 21 identifies eighty-six biomarkers that are useful for diagnosinglung cancer in both tissue and blood samples. Table 20 identifiestwenty-five biomarkers that were identified in tissue samples, but whichare useful in serum and plasma samples as well. This is a surprisinglylarger number than expected when compared to what is typically foundduring biomarker discovery efforts and may be attributable to the scaleof the described study, which encompassed over 800 proteins measured inhundreds of individual samples, in some cases at concentrations in thelow femtomolar range. Presumably, the large number of discoveredbiomarkers reflects the diverse biochemical pathways implicated in bothtumor biology and the body's response to the tumor's presence; eachpathway and process involves many proteins. The results show that nosingle protein of a small group of proteins is uniquely informativeabout such complex processes; rather, that multiple proteins areinvolved in relevant processes, such as apoptosis or extracellularmatrix repair, for example.

Given the numerous biomarkers identified during the described study, onewould expect to be able to derive large numbers of high-performingclassifiers that can be used in various diagnostic methods. To test thisnotion, tens of thousands of classifiers were evaluated using thebiomarkers in Table 1. As described in Example 4, many subsets of thebiomarkers presented in Table 1 can be combined to generate usefulclassifiers. By way of example, descriptions are provided forclassifiers containing 1, 2, and 3 biomarkers for each of two uses: lungcancer screening of smokers at high risk and diagnosis of individualsthat have pulmonary nodules that are detectable by CT. As described inExample 4, all classifiers that were built using the biomarkers in Table1 perform distinctly better than classifiers that were built using“non-markers”.

The performance of classifiers obtained by randomly excluding some ofthe markers in Table 1, which resulted in smaller subsets from which tobuild the classifiers, was also tested. As described in Example 4, Part3, the classifiers that were built from random subsets of the markers inTable 1 performed similarly to optimal classifiers that were built usingthe full list of markers in Table 1.

The performance of ten-marker classifiers obtained by excluding the“best” individual markers from the ten-marker aggregation was alsotested. As described in Example 4, Part 3, classifiers constructedwithout the “best” markers in Table 1 also performed well. Many subsetsof the biomarkers listed in Table 1 performed close to optimally, evenafter removing the top 15 of the markers listed in the Table. Thisimplies that the performance characteristics of any particularclassifier are likely not due to some small core group of biomarkers andthat the disease process likely impacts numerous biochemical pathways,which alters the expression level of many proteins.

The results from Example 4 suggest certain possible conclusions: First,the identification of a large number of biomarkers enables theiraggregation into a vast number of classifiers that offer similarly highperformance. Second, classifiers can be constructed such that particularbiomarkers may be substituted for other biomarkers in a manner thatreflects the redundancies that undoubtedly pervade the complexities ofthe underlying disease processes. That is to say, the information aboutthe disease contributed by any individual biomarker identified in Table1 overlaps with the information contributed by other biomarkers, suchthat it may be that no particular biomarker or small group of biomarkersin Table 1 must be included in any classifier.

Exemplary embodiments use naïve Bayes classifiers constructed from thedata in Tables 38 and 39 to classify an unknown sample. The procedure isoutlined in FIGS. 1A and B. In one embodiment, the biological sample isoptionally diluted and run in a multiplexed aptamer assay. The data fromthe assay are normalized and calibrated as outlined in Example 3, andthe resulting biomarker levels are used as input to a Bayesclassification scheme. The log-likelihood ratio is computed for eachmeasured biomarker individually and then summed to produce a finalclassification score, which is also referred to as a diagnostic score.The resulting assignment as well as the overall classification score canbe reported. Optionally, the individual log-likelihood risk factorscomputed for each biomarker level can be reported as well. The detailsof the classification score calculation are presented in Example 3.

To demonstrate the utility of aptamer-based proteomic technologydescribed herein for use in discovery of disease-related biomarkers fromtissues, homogenized tissues samples from surgical resections obtainedfrom eight non-small cell lung cancer (NSCLC) patients were analyzed, asdescribed in Example 6. All NSCLC patients were smokers, ranging in agefrom 47 to 75 years old and covering NSCLC stages 1A through 3B (Table17). Three samples were obtained from each resection: tumor tissuesample, adjacent non-tumor tissue. Total protein concentration wasadjusted and normalized in each homogenate for proteomic profilingfollowed by analysis the DNA microarray platform to measure theconcentrations of over 800 human proteins (see Gold et al., NaturePrecedings, http://precedings.nature.com/documents/4538/version/1(2010)).

The protein concentration measurements, expressed as relativefluorescence units (RFU), allow large-scale comparisons of proteinsignatures among samples (see FIG. 21). With reference to FIG. 21, firstthe protein expression levels between the control adjacent and distanttissues was compared for each patient sample (FIG. 21A). In thiscomparison, only one analyte (fibrinogen) exhibited more than a two-folddifference between the samples. Overall, the signals generated by mostanalytes were similar in adjacent and distant tissue.

In contrast, comparison of tumor tissues with non-tumor tissue (adjacentor distant) identified 11 (1.3%) proteins with greater than four-folddifferences and 53 (6.5%) proteins with greater than two-folddifferences (see FIGS. 21B and 21C). The remaining 767 (93.5%) proteinsshowed relatively smaller differences between tumor and non-tumortissue. Some proteins were substantially suppressed while others wereelevated in tumor tissues compared to adjacent or distant tissues.Differential expression of proteins between adjacent and tumor tissue,or between distal and tumor tissue, was similar overall. Changes indistal tissue were generally larger (FIG. 21), which demonstrates thatmost protein changes are specific to the local tumor environment.

To identify NSCLC tissue biomarkers, protein expression levels betweentumor, adjacent and distant tissue samples were compared using theMann-Whitney test as described in Ostroff et al. Nature Precedings,http://precedings.nature.com/documents/4537/version/1 (2010)).Thirty-six proteins with the greatest fold change and with statisticallysignificant differences between tumor and non-tumor tissue wereidentified with a false discovery rate cutoff of q<0.05 for significanceFIGS. 23 and 24, and Table 18). Twenty of these proteins wereup-regulated and 16 were down-regulated in tumor tissue. Although thenumber of samples used for this study was relatively small, a powerfulindividual-based study design in which each tumor sample had its ownhealthy tissue controls was employed. This eliminates the populationvariance associated with population-based study designs. Theavailability of appropriately chosen reference samples is increasinglyrecognized as a critically important component in biomarker discoveryresearch (Bossuyt (2011) J. Am. Med. Assoc. 305:2229-30; Ioannidis andPanagiotou (2011) J. Am. Med. Assoc. 305:2200-10; Diamandis (2010) J.Natl. Cancer Inst. 102:1462-7).

It is believed that approximately one-third (13/36) of the NSCLC tissuebiomarkers identified herein are novel. The remaining two-thirds (23/36)have been reported previously as differentially expressed proteins orgenes in NSCLC tumor tissue (Table 18).

The biomarkers identified according to the method of Example 6 can beclassified broadly into four biological processes associated withimportant hallmarks of tumor biology (Hanahan & Weinberg (2011) Cell144:646-74) as shown in Table 19: 1) angiogenesis, 2) growth andmetabolism, 3) inflammation and apoptosis, and 4) invasion andmetastasis. Admittedly, these are convenient albeit inexactclassifications that approximate a highly complex and dynamic system inwhich these molecules often play multiple and nuanced roles. Therefore,the specific state of a given system ultimately affects the expressionand function of any particular molecule. Biological understanding is farfrom complete in these systems. With the SOMAscan platform, thequantitative expression of large numbers of proteins in various tissuesand disease processes is made possible. These data provide newcoordinates to help map the dynamics of these systems, which in turnwill provide a more complete understanding of the biology of lung canceras well as other diseases. The results from the current study provide anew perspective on NSCLC tumor biology, with both familiar and newelements.

Angiogenesis

Angiogenesis drives growth of new blood vessels to support tumor growthand metabolism. The regulation of angiogenesis is a complex biologicalphenomenon controlled by both positive and negative signals (Hanahan &Weinberg, (2011) Cell 144:646-74). Among the NSCLC tissue biomarkersidentified in this study were well known positive and negativeangiogenesis regulators (FIGS. 23 and 24 and Table 19), all of whichhave been observed previously in NSCLC tumor tissue (Fontanini et al.(1999) British Journal of Cancer 79(2):363-369; Imoto et al. (1998) J.Thorac. Carciovasc. Surg 115:1007-1011; Ohta et al. (2006) Ann. Thorac.Surg. 82:1180-1184; Iizasa et al. (2004) Clinical Cancer Research10:5361-5366). These include the prototypic angiogenesis inducer VEGFand inhibitors endostatin and thrombospondin-1 (TSP-1). VEGF is apowerful growth factor that promotes new blood vessel growth and wasstrongly up-regulated in NSCLC tumor tissue, consistent with previousobservations (Imoto et al. (1998) J. Thorac. Carciovasc. Surg115:1007-1011), and including our study of serum samples from NSCLCpatients (Ostroff et al. Nature Precedings,http://precedings.nature.com/documents/4537/version/1 (2010)).Endostatin is a proteolytic fragment of Collagen XVIII and a stronginhibitor of endothelial cell proliferation and angiogenesis (Iizasa etal. (August 2004) Clinical Cancer Research 10:5361-5366). TSP-1 and therelated thrombospondin-2 (TSP-2) were substantially up-regulated inNSCLC tumor tissue. TSP-1 and TSP-2 are extracellular matrix proteinswith complex, context-dependent effects modulated through a variety ofinteractions with cell-surface receptors, growth factors, cytokines,matrix metalloproteinases, and other molecules. Archetypically in modelsystems, TSP-1 and TSP-2 inhibit angiogenesis by inhibiting endothelialcell proliferation through the CD47 receptor and inducing endothelialcell apoptosis through the CD36 receptor. There is also evidence forproangiogenic influences for TSP-1 and TSP-2 (Bornstein (2009) J. CellCommun. Signal. 3(3-4):189-200). Finally, reported TSP-1 and TSP-2relative and absolute expression levels in NSCLC tissue vary (Chijiwa etal. (2009) Oncology Reports 22:279-283; Chen et al. (2009) J Int Med Res37:551-556; Oshika 1998, Fontanini et al. (1999) British Journal ofCancer 79(2):363-369), likely due to their complex functions. In thisstudy, it was found that CD36 was down-regulated in NSCLC tumor tissue,which could indicate an adaptation of tumor cells reduce sensitivity toTSP-1 and TSP-2-mediated apoptosis.

Growth and Metabolism

Ten of the NSCLC biomarkers identified are associated with growth andmetabolism functions. Half of these biomarkers are involved in thecomplex hormonal regulation of cellular growth and energy metabolism.Three insulin-like growth factor binding proteins (IGFBPs), whichmodulate the activity of insulin-like growth factors (IGFs), wereup-regulated in NSCLC tumors (IGFBP-2, -5, and -7). Several reports havequalitatively assessed IGFBP-2, -5, and -7 in NSCLC (Table 18) andsuggest higher expression in NSCLC tissue than in normal tissue. Insulinand IGFs are hormones that strongly influence cellular growth andmetabolism, and cancer cells are often dependent on these molecules forgrowth and proliferation (Robert et al. (August 1999) Clinical CancerResearch 5:2094-2102; Liu et al. (June 2007) Lung Cancer 56(3):307-317;Singhal et al. (2008) Lung Cancer 60:313-324). These hormones are inturn degraded by insulysin, which we find up-regulated in NSCLC tumortissue. The hormone adiponectin controls lipid metabolism and insulinsensitivity, and we found adiponectin down-regulated in NSCLC tumors.The remaining five biomarkers, carbonic anhydrase III, NAGK, TrATPase,tryptase β-2, and MAPK13, are enzymes with roles in cellular metabolism(Table 17).

Inflammation and Apoptosis

Inflammation and apoptosis are hallmarks of cancer biology, and a numberof potential biomarkers associated with these processes that have beenassociated previously with NSCLC (Table 19). Caspase-3, which has beenassociated with metastasis (Chen et al. (2010) Lung Cancer(doi:1016/j.lungcan.2010.10.015), was found to be up-regulated in NSCLCtumor tissue. Another notable example is RAGE, which has been reportedto be dramatically down-regulated in NSCLC tissue (Jing et al. (2010)Neoplasma. 57:55-61, Bartling et al. (2005) Carcinogenesis 26:293-301).This finding is consistent with the measurement disclosed herein, inwhich sRAGE had the largest observed change for proteins that are lowerin tumor than in non-malignant tissue. Although not limited by theory,one hypothesis is that RAGE plays a role in epithelial organization, anddecreased levels of RAGE in lung tumors may contribute to loss ofepithelial tissue structure, potentially leading to malignanttransformation (Bartling et al. (2005) Carcinogenesis 26(2):293-301).Several chemokines, such as BCA-1, CXCL16, IL-8, and NAP-2, are altered(Table 18), consistent with the hypothesis that invasion of tumors withcells from the innate and adaptive arms of the immune system providebioactive molecules that affect proliferative and angiogenic signals(Hanahan & Weinberg (2011) Cell 144:646-74).

Invasion and Metastasis

The largest group of potential biomarkers contains proteins thatfunction in cell-cell and cell-matrix interactions and are involved ininvasion and metastasis. Many have been previously reported to beassociated with NSCLC. Most notable are two of the matrixmetalloproteases, MMP-7 and MMP-12, which contribute to proteolyticdegradation of extracellular matrix components and processing ofsubstrates such as growth factors (see e.g. Su et al. (2004) ChineseJournal of Clinical Oncology 1(2):126-130; Wegmann et al. (1993) Eur. J.Cancer 29A(11):1578-1584). Such processes are well known to play a rolein creating tumor microenvironments. It was found that both MMP-7 andMMP-12 were up-regulated in NSCLC tissue (Table 18), which is consistentwith similar study that used antibody-based measurements (Shah et al.(2010) The Journal of Thoracic and Cardiovascular Surgery139(4):984-990). The over-expression of MMP-7 and MMP-12 has beenassociated with poor prognosis in NSCLC (Shah et al. (2010) The Journalof Thoracic and Cardiovascular Surgery 139(4):984-990). MMP-12 levelshave been correlated with local recurrence and metastatic disease(Hofmann et al. (2005) Clin. Cancer Res. 11:1086-92, Hoffman et al.(2006) Oncol. Rep. 16:587-95).). Two of the eight subjects studied hadnormal levels of MMP-12, whereas the other six had 15-50× elevation ofMMP-12 in tumor tissue compared to non-tumor tissue.

Performance of NSCLC Biomarkers as Histochemistry Probes

An understanding of the differences in protein expression between tumorand non-tumor tissues can be used to identify novel histochemistryprobes. Such probes can enable more precise molecular characterizationof tumors and their effects on the surrounding stroma. FIG. 25demonstrates the ability of two of the identified SOMAmers to stainfresh frozen tissues obtained from the same tumor resections used forthe discovery of these biomarkers. Thrombospondin-2 (TSP2) was found tobe increased in tumor tissue homogenates while macrophage mannosereceptor (MRC1) was decreased. Tissue staining with these SOMAmers wasconsistent with the profiling results. Additional examples, as well asantibody confirmation of staining patterns, are shown in FIG. 27.

Comparison of NSCLC Tissue and Serum Biomarkers

Differential expression of proteins in sera of NSCLC patients relativeto cancer-free controls compared with that of NSCLC tissue samplesyields useful insights (FIG. 26). The most striking observation is thatrelative changes in protein expression are greater in tissues than inserum. This result could be expected since tumor tissue is the source ofthe changes in protein expression that is then, even if fully releasedinto circulation, diluted many-fold into total volume of blood. Thistrend is evident in the elongated distribution of data points along thex-axis in FIG. 26 in which axes are drawn on the same scale toillustrate this point. Twelve of the analytes shown in FIGS. 23 and 24as altered in tumor tissue are also differentially expressed in serafrom NSCLC patients vs. controls (filled red circles in FIG. 26). Mostof the directional changes are the same between tissue and sera, but afew are not. Local concentrations of proteins in a tissue homogenateclearly need not correlate with circulating levels of the proteins andinverse correlations may provide clues regarding the redistribution ofcertain biomarkers in diseased versus normal tissues.

The discovery of novel biomarkers with demonstrable diagnostic orclinical utility has been a considerable challenge in recent years(Diamandis (2010) J. Natl. Cancer Inst. 102:1462-7). The reasons forthis include the omnipresence of pre-analytical and analyticalartifacts, unavailability of suitable healthy-state controls andunsophisticated study designs, and the difficulty of detecting smallchanges in protein levels at very low concentrations. This challenge isespecially pronounced with cancer biomarkers where the objective isoften to identify a tiny malignancy in a relatively large human body atan early stage. With regard to the later point, one way to improve thechances of discovering true cancer biomarkers is to obtain proteinexpression data from both the source of the disease, such as tumortissue, as well as from the circulation. The combined results canpartially corroborate the validity of potential biomarkers. The instantapplication demonstrates that this is possible with the disclosed highlymultiplexed and sensitive proteomic assay. It has been shown thattissues, like plasma or serum, are also amenable to SOMAscan, and theresulting comparative analysis of protein expression in NSCLC tumortissues with surrounding healthy lung tissues offers a complement to theexisting dataset of potential NSCLC biomarkers identified from serumsamples (see U.S. Pub. No. 2010/0070191). In the instant case, onethird, or twelve of the thirty-six tissue biomarkers reported herein(BCA-1 (BCL), cadherin-1 (cadherin-E), catalase, endostatin, IGFBP-2,MRC1 (macrophage mannose receptor), MAPK-13 (MK13), MMP-7, MMP-12, NAGK,VEGF and YES have been previously identified in serum. Taken together,these data contribute to further understanding of the complexity ofchanges accompanying NSCLC and provide additional potential biomarkersfor the early detection of this deadly disease.

Kits

Any combination of the biomarkers of Table 20 (as well as additionalbiomedical information) can be detected using a suitable kit, such asfor use in performing the methods disclosed herein. Furthermore, any kitcan contain one or more detectable labels as described herein, such as afluorescent moiety, etc.

In one embodiment, a kit includes (a) one or more capture reagents (suchas, for example, at least one aptamer or antibody) for detecting one ormore biomarkers in a biological sample, wherein the biomarkers includeany of the biomarkers set forth in Tables 18, 20 or 21 and optionally(b) one or more software or computer program products for classifyingthe individual from whom the biological sample was obtained as eitherhaving or not having lung cancer or for determining the likelihood thatthe individual has lung cancer, as further described herein.Alternatively, rather than one or more computer program products, one ormore instructions for manually performing the above steps by a human canbe provided.

The combination of a solid support with a corresponding capture reagentand a signal generating material is referred to herein as a “detectiondevice” or “kit”. The kit can also include instructions for using thedevices and reagents, handling the sample, and analyzing the data.Further the kit may be used with a computer system or software toanalyze and report the result of the analysis of the biological sample.

The kits can also contain one or more reagents (e.g., solubilizationbuffers, detergents, washes, or buffers) for processing a biologicalsample. Any of the kits described herein can also include, e.g.,buffers, blocking agents, mass spectrometry matrix materials, antibodycapture agents, positive control samples, negative control samples,software and information such as protocols, guidance and reference data.

In one aspect, the invention provides kits for the analysis of lungcancer status. The kits include PCR primers for one or more biomarkersselected from Tables 18, 20, or 21. The kit may further includeinstructions for use and correlation of the biomarkers with lung cancer.The kit may also include a DNA array containing the complement of one ormore of the biomarkers selected from Table 20, reagents, and/or enzymesfor amplifying or isolating sample DNA. The kits may include reagentsfor real-time PCR, for example, TaqMan probes and/or primers, andenzymes.

For example, a kit can comprise (a) reagents comprising at least capturereagent for quantifying one or more biomarkers in a test sample, whereinsaid biomarkers comprise the biomarkers set forth in Tables 18, 20, or21, or any other biomarkers or biomarkers panels described herein, andoptionally (b) one or more algorithms or computer programs forperforming the steps of comparing the amount of each biomarkerquantified in the test sample to one or more predetermined cutoffs andassigning a score for each biomarker quantified based on saidcomparison, combining the assigned scores for each biomarker quantifiedto obtain a total score, comparing the total score with a predeterminedscore, and using said comparison to determine whether an individual haslung cancer. Alternatively, rather than one or more algorithms orcomputer programs, one or more instructions for manually performing theabove steps by a human can be provided.

Computer Methods and Software

Once a biomarker or biomarker panel is selected, a method for diagnosingan individual can comprise the following: 1) collect or otherwise obtaina biological sample; 2) perform an analytical method to detect andmeasure the biomarker or biomarkers in the panel in the biologicalsample; 3) perform any data normalization or standardization requiredfor the method used to collect biomarker values; 4) calculate the markerscore; 5) combine the marker scores to obtain a total diagnostic score;and 6) report the individual's diagnostic score. In this approach, thediagnostic score may be a single number determined from the sum of allthe marker calculations that is compared to a preset threshold valuethat is an indication of the presence or absence of disease. Or thediagnostic score may be a series of bars that each represent a biomarkervalue and the pattern of the responses may be compared to a pre-setpattern for determination of the presence or absence of disease.

At least some embodiments of the methods described herein can beimplemented with the use of a computer. An example of a computer system100 is shown in FIG. 6. With reference to FIG. 6, system 100 is showncomprised of hardware elements that are electrically coupled via bus108, including a processor 101, input device 102, output device 103,storage device 104, computer-readable storage media reader 105 a,communications system 106 processing acceleration (e.g., DSP orspecial-purpose processors) 107 and memory 109. Computer-readablestorage media reader 105 a is further coupled to computer-readablestorage media 105 b, the combination comprehensively representingremote, local, fixed and/or removable storage devices plus storagemedia, memory, etc. for temporarily and/or more permanently containingcomputer-readable information, which can include storage device 104,memory 109 and/or any other such accessible system 100 resource. System100 also comprises software elements (shown as being currently locatedwithin working memory 191) including an operating system 192 and othercode 193, such as programs, data and the like.

With respect to FIG. 6, system 100 has extensive flexibility andconfigurability. Thus, for example, a single architecture might beutilized to implement one or more servers that can be further configuredin accordance with currently desirable protocols, protocol variations,extensions, etc. However, it will be apparent to those skilled in theart that embodiments may well be utilized in accordance with morespecific application requirements. For example, one or more systemelements might be implemented as sub-elements within a system 100component (e.g., within communications system 106). Customized hardwaremight also be utilized and/or particular elements might be implementedin hardware, software or both. Further, while connection to othercomputing devices such as network input/output devices (not shown) maybe employed, it is to be understood that wired, wireless, modem, and/orother connection or connections to other computing devices might also beutilized.

In one aspect, the system can comprise a database containing features ofbiomarkers characteristic of lung cancer. The biomarker data (orbiomarker information) can be utilized as an input to the computer foruse as part of a computer implemented method. The biomarker data caninclude the data as described herein.

In one aspect, the system further comprises one or more devices forproviding input data to the one or more processors.

The system further comprises a memory for storing a data set of rankeddata elements.

In another aspect, the device for providing input data comprises adetector for detecting the characteristic of the data element, e.g.,such as a mass spectrometer or gene chip reader.

The system additionally may comprise a database management system. Userrequests or queries can be formatted in an appropriate languageunderstood by the database management system that processes the query toextract the relevant information from the database of training sets.

The system may be connectable to a network to which a network server andone or more clients are connected. The network may be a local areanetwork (LAN) or a wide area network (WAN), as is known in the art.Preferably, the server includes the hardware necessary for runningcomputer program products (e.g., software) to access database data forprocessing user requests.

The system may include an operating system (e.g., UNIX or Linux) forexecuting instructions from a database management system. In one aspect,the operating system can operate on a global communications network,such as the internet, and utilize a global communications network serverto connect to such a network.

The system may include one or more devices that comprise a graphicaldisplay interface comprising interface elements such as buttons, pulldown menus, scroll bars, fields for entering text, and the like as areroutinely found in graphical user interfaces known in the art. Requestsentered on a user interface can be transmitted to an application programin the system for formatting to search for relevant information in oneor more of the system databases. Requests or queries entered by a usermay be constructed in any suitable database language.

The graphical user interface may be generated by a graphical userinterface code as part of the operating system and can be used to inputdata and/or to display inputted data. The result of processed data canbe displayed in the interface, printed on a printer in communicationwith the system, saved in a memory device, and/or transmitted over thenetwork or can be provided in the form of the computer readable medium.

The system can be in communication with an input device for providingdata regarding data elements to the system (e.g., expression values). Inone aspect, the input device can include a gene expression profilingsystem including, e.g., a mass spectrometer, gene chip or array reader,and the like.

The methods and apparatus for analyzing lung cancer biomarkerinformation according to various embodiments may be implemented in anysuitable manner, for example, using a computer program operating on acomputer system. A conventional computer system comprising a processorand a random access memory, such as a remotely-accessible applicationserver, network server, personal computer or workstation may be used.Additional computer system components may include memory devices orinformation storage systems, such as a mass storage system and a userinterface, for example a conventional monitor, keyboard and trackingdevice. The computer system may be a stand-alone system or part of anetwork of computers including a server and one or more databases.

The lung cancer biomarker analysis system can provide functions andoperations to complete data analysis, such as data gathering,processing, analysis, reporting and/or diagnosis. For example, in oneembodiment, the computer system can execute the computer program thatmay receive, store, search, analyze, and report information relating tothe lung cancer biomarkers. The computer program may comprise multiplemodules performing various functions or operations, such as a processingmodule for processing raw data and generating supplemental data and ananalysis module for analyzing raw data and supplemental data to generatea lung cancer status and/or diagnosis. Diagnosing lung cancer status maycomprise generating or collecting any other information, includingadditional biomedical information, regarding the condition of theindividual relative to the disease, identifying whether further testsmay be desirable, or otherwise evaluating the health status of theindividual.

Referring to FIG. 7, an example of a method of utilizing a computer inaccordance with principles of a disclosed embodiment can be seen. InFIG. 7, a flowchart 3000 is shown. In block 3004, biomarker informationcan be retrieved for an individual. The biomarker information can beretrieved from a computer database, for example, after testing of theindividual's biological sample is performed. The biomarker informationcan comprise biomarker values that each correspond to one of at least Nbiomarkers selected from a group consisting of the biomarkers providedin Table 18, wherein N=2-36, Table 20, wherein N=2-25 or Table 21,wherein N=2-86. In block 3008, a computer can be utilized to classifyeach of the biomarker values. And, in block 3012, a determination can bemade as to the likelihood that an individual has lung cancer based upona plurality of classifications. The indication can be output to adisplay or other indicating device so that it is viewable by a person.Thus, for example, it can be displayed on a display screen of a computeror other output device.

Referring to FIG. 8, an alternative method of utilizing a computer inaccordance with another embodiment can be illustrated via flowchart3200. In block 3204, a computer can be utilized to retrieve biomarkerinformation for an individual. The biomarker information comprises abiomarker value corresponding to a biomarker selected from the group ofbiomarkers provided in Tables 18, 20 or 21. In block 3208, aclassification of the biomarker value can be performed with thecomputer. And, in block 3212, an indication can be made as to thelikelihood that the individual has lung cancer based upon theclassification. The indication can be output to a display or otherindicating device so that it is viewable by a person. Thus, for example,it can be displayed on a display screen of a computer or other outputdevice.

Some embodiments described herein can be implemented so as to include acomputer program product. A computer program product may include acomputer readable medium having computer readable program code embodiedin the medium for causing an application program to execute on acomputer with a database.

As used herein, a “computer program product” refers to an organized setof instructions in the form of natural or programming languagestatements that are contained on a physical media of any nature (e.g.,written, electronic, magnetic, optical or otherwise) and that may beused with a computer or other automated data processing system. Suchprogramming language statements, when executed by a computer or dataprocessing system, cause the computer or data processing system to actin accordance with the particular content of the statements. Computerprogram products include without limitation: programs in source andobject code and/or test or data libraries embedded in a computerreadable medium. Furthermore, the computer program product that enablesa computer system or data processing equipment device to act inpre-selected ways may be provided in a number of forms, including, butnot limited to, original source code, assembly code, object code,machine language, encrypted or compressed versions of the foregoing andany and all equivalents.

In one aspect, a computer program product is provided for indicating alikelihood of lung cancer. The computer program product includes acomputer readable medium embodying program code executable by aprocessor of a computing device or system, the program code comprising:code that retrieves data attributed to a biological sample from anindividual, wherein the data comprises biomarker values that eachcorrespond to one of at least N biomarkers in the biological sampleselected from the group of biomarkers provided in Table 18, whereinN=2-36, Table 20, wherein N=2-25 or Table 21, wherein N=2-86; and codethat executes a classification method that indicates a lung diseasestatus of the individual as a function of the biomarker values.

In still another aspect, a computer program product is provided forindicating a likelihood of lung cancer. The computer program productincludes a computer readable medium embodying program code executable bya processor of a computing device or system, the program codecomprising: code that retrieves data attributed to a biological samplefrom an individual, wherein the data comprises a biomarker valuecorresponding to a biomarker in the biological sample selected from thegroup of biomarkers provided in Table 18, wherein N=2-36, Table 20,wherein N=2-25 or Table 21, wherein N-2-86; and code that executes aclassification method that indicates a lung disease status of theindividual as a function of the biomarker value.

While various embodiments have been described as methods or apparatuses,it should be understood that embodiments can be implemented through codecoupled with a computer, e.g., code resident on a computer or accessibleby the computer. For example, software and databases could be utilizedto implement many of the methods discussed above. Thus, in addition toembodiments accomplished by hardware, it is also noted that theseembodiments can be accomplished through the use of an article ofmanufacture comprised of a computer usable medium having a computerreadable program code embodied therein, which causes the enablement ofthe functions disclosed in this description. Therefore, it is desiredthat embodiments also be considered protected by this patent in theirprogram code means as well. Furthermore, the embodiments may be embodiedas code stored in a computer-readable memory of virtually any kindincluding, without limitation, RAM, ROM, magnetic media, optical media,or magneto-optical media. Even more generally, the embodiments could beimplemented in software, or in hardware, or any combination thereofincluding, but not limited to, software running on a general purposeprocessor, microcode, PLAs, or ASICs.

It is also envisioned that embodiments could be accomplished as computersignals embodied in a carrier wave, as well as signals (e.g., electricaland optical) propagated through a transmission medium. Thus, the varioustypes of information discussed above could be formatted in a structure,such as a data structure, and transmitted as an electrical signalthrough a transmission medium or stored on a computer readable medium.

It is also noted that many of the structures, materials, and actsrecited herein can be recited as means for performing a function or stepfor performing a function. Therefore, it should be understood that suchlanguage is entitled to cover all such structures, materials, or actsdisclosed within this specification and their equivalents, including thematter incorporated by reference.

EXAMPLES

The following examples are provided for illustrative purposes only andare not intended to limit the scope of the application as defined by theappended claims. All examples described herein were carried out usingstandard techniques, which are well known and routine to those of skillin the art. Routine molecular biology techniques described in thefollowing examples can be carried out as described in standardlaboratory manuals, such as Sambrook et al., Molecular Cloning: ALaboratory Manual, 3rd. ed., Cold Spring Harbor Laboratory Press, ColdSpring Harbor, N.Y., (2001).

Example 1 Multiplexed Aptamer Analysis of Samples for Lung CancerBiomarker Selection

This example describes the multiplex aptamer assay used to analyze thesamples and controls for the identification of the biomarkers set forthin Table 1, Col. 2 (see FIG. 9). In this case, the multiplexed analysisutilized 820 aptamers, each unique to a specific target.

In this method, pipette tips were changed for each solution addition.

Also, unless otherwise indicated, most solution transfers and washadditions used the 96-well head of a Beckman Biomek Fx^(P). Method stepsmanually pipetted used a twelve channel P200 Pipetteman (RaininInstruments, LLC, Oakland, Calif.), unless otherwise indicated. A custombuffer referred to as SB 17 was prepared in-house, comprising 40 mMHEPES, 100 mM NaCl, 5 mM KCl, 5 mM MgCl₂, 1 mM EDTA at pH7.5. All stepswere performed at room temperature unless otherwise indicated.

1. Preparation of Aptamer Stock Solution

For aptamers without a photo-cleavable biotin linker, custom stockaptamer solutions for 10%, 1% and 0.03% serum were prepared at 8×concentration in 1×SB17, 0.05% Tween-20 with appropriatephoto-cleavable, biotinylated primers, where the resultant primerconcentration was 3 times the relevant aptamer concentration. Theprimers hybridized to all or part of the corresponding aptamer.

Each of the 3, 8× aptamer solutions were diluted separately 1:4 into1×SB17, 0.05% Tween-20 (1500 μL of 8× stock into 4500 μL of 1×SB17,0.05% Tween-20) to achieve a 2× concentration. Each diluted aptamermaster mix was then split, 1500 μL each, into 4, 2 mL screw cap tubesand brought to 95° C. for 5 minutes, followed by a 37° C. incubation for15 minutes. After incubation, the 4, 2 mL tubes corresponding to aparticular aptamer master mix were combined into a reagent trough, and55 μL of a 2× aptamer mix (for all three mixes) was manually pipettedinto a 96-well Hybaid plate and the plate foil sealed. The final resultwas 3, 96-well, foil-sealed Hybaid plates. The individual aptamerconcentration ranged from 0.5-4 nM as indicated in Table 2.

2. Assay Sample Preparation

Frozen aliquots of 100% serum, stored at −80° C., were placed in 25° C.water bath for 10 minutes. Thawed samples were placed on ice, gentlyvortexed (set on 4) for 8 seconds and then replaced on ice.

A 20% sample solution was prepared by transferring 16 μL of sample usinga 50 μL 8-channel spanning pipettor into 96-well Hybaid plates, eachwell containing 64 μL of the appropriate sample diluent at 4° C.(0.8×SB17, 0.05% Tween-20, 2 μM Z-block_(—)2, 0.6 mM MgCl₂ for serum).This plate was stored on ice until the next sample dilution steps wereinitiated.

To commence sample and aptamer equilibration, the 20% sample plate wasbriefly centrifuged and placed on the Beckman FX where it was mixed bypipetting up and down with the 96-well pipettor. A 2% sample was thenprepared by diluting 10 μL of the 20% sample into 90 μL of 1×SB17, 0.05%Tween-20. Next, dilution of 6 μL of the resultant 2% sample into 194 μLof 1×SB17, 0.05% Tween-20 made a 0.06% sample plate. Dilutions were doneon the Beckman Biomek Fx^(P). After each transfer, the solutions weremixed by pipetting up and down. The 3 sample dilution plates were thentransferred to their respective aptamer solutions by adding 55 μL of thesample to 55 μL of the appropriate 2× aptamer mix. The sample andaptamer solutions were mixed on the robot by pipetting up and down.

3. Sample Equilibration Binding

The sample/aptamer plates were foil sealed and placed into a 37° C.incubator for 3.5 hours before proceeding to the Catch 1 step.

4. Preparation of Catch 2 Bead Plate

An 11 mL aliquot of MyOne (Invitrogen Corp., Carlsbad, Calif.)Streptavidin C1 beads was washed 2 times with equal volumes of 20 mMNaOH (5 minute incubation for each wash), 3 times with equal volumes of1×SB17, 0.05% Tween-20 and resuspended in 11 mL 1×SB17, 0.05% Tween-20.Using a 12-span multichannel pipettor, 50 μL of this solution wasmanually pipetted into each well of a 96-well Hybaid plate. The platewas then covered with foil and stored at 4° C. for use in the assay.

5. Preparation of Catch 1 Bead Plates

Three 0.45 μm Millipore HV plates (Durapore membrane, Cat# MAHVN4550)were equilibrated with 100 μL of 1×SB17, 0.05% Tween-20 for at least 10minutes. The equilibration buffer was then filtered through the plateand 133.3 μL of a 7.5% Streptavidin-agarose bead slurry (in 1×SB17,0.05% Tween-20) was added into each well. To keep thestreptavidin-agarose beads suspended while transferring them into thefilter plate, the bead solution was manually mixed with a 200 μL,12-channel pipettor, 15 times. After the beads were distributed acrossthe 3 filter plates, a vacuum was applied to remove the beadsupernatant. Finally, the beads were washed in the filter plates with200 μL 1×SB17, 0.05% Tween-20 and then resuspended in 200 μL 1×SB17,0.05% Tween-20. The bottoms of the filter plates were blotted and theplates stored for use in the assay.

6. Loading the Cytomat

The cytomat was loaded with all tips, plates, all reagents in troughs(except NHS-biotin reagent which was prepared fresh right beforeaddition to the plates), 3 prepared catch 1 filter plates and 1 preparedMyOne plate.

7. Catch 1

After a 3.5 hour equilibration time, the sample/aptamer plates wereremoved from the incubator, centrifuged for about 1 minute, foilremoved, and placed on the deck of the Beckman Biomek Fx^(P). TheBeckman Biomek Fx^(P) program was initiated. All subsequent steps inCatch 1 were performed by the Beckman Biomek Fx^(P) robot unlessotherwise noted. Within the program, the vacuum was applied to the Catch1 filter plates to remove the bead supernatant. One hundred microlitresof each of the 10%, 1% and 0.03% equilibration binding reactions wereadded to their respective Catch 1 filtration plates, and each plate wasmixed using an on-deck orbital shaker at 800 rpm for 10 minutes.

Unbound solution was removed via vacuum filtration. The catch 1 beadswere washed with 190 μL of 100 μM biotin in 1×SB17, 0.05% Tween-20followed by 190 μL of 1×SB17, 0.05% Tween-20 by dispensing the solutionand immediately drawing a vacuum to filter the solution through theplate.

Next, 190 μL 1×SB17, 0.05% Tween-20 was added to the Catch 1 plates.Plates were blotted to remove droplets using an on-deck blot station andthen incubated with orbital shakers at 800 rpm for 10 minutes at 25° C.

The robot removed this wash via vacuum filtration and blotted the bottomof the filter plate to remove droplets using the on-deck blot station.

8. Tagging

A NHS-PEO4-biotin aliquot was thawed at 37° C. for 6 minutes and thendiluted 1:100 with tagging buffer (SB17 at pH=7.25 0.05% Tween-20). TheNHS-PEO4-biotin reagent was dissolved at 100 mM concentration inanhydrous DMSO and had been stored frozen at −20° C. Upon a robotprompt, the diluted NHS-PEO4-biotin reagent was manually added to anon-deck trough and the robot program was manually re-initiated todispense 100 μL of the NHS-PEO4-biotin into each well of each Catch 1filter plate. This solution was allowed to incubate with Catch 1 beadsshaking at 800 rpm for 5 minutes on the orbital shakers.

9. Kinetic Challenge and Photo-cleavage

The tagging reaction was quenched by the addition of 150 μL of 20 mMglycine in 1×SB17, 0.05% Tween-20 to the Catch 1 plates while stillcontaining the NHS tag. The plates were then incubated for 1 minute onorbital shakers at 800 rpm. The NHS-tag/glycine solution was removed viavacuum filtration. Next, 190 μL 20 mM glycine (1×SB17, 0.05% Tween-20)was added to each plate and incubated for 1 minute on orbital shakers at800 rpm before removal by vacuum filtration.

190 μL of 1×SB17, 0.05% Tween-20 was added to each plate and removed byvacuum filtration.

The wells of the Catch 1 plates were subsequently washed three times byadding 190 μL 1×SB17, 0.05% Tween-20, placing the plates on orbitalshakers for 1 minute at 800 rpm followed by vacuum filtration. After thelast wash the plates were placed on top of a 1 mL deep-well plate andremoved from the deck. The Catch 1 plates were centrifuged at 1000 rpmfor 1 minute to remove as much extraneous volume from the agarose beadsbefore elution as possible.

The plates were placed back onto the Beckman Biomek Fx^(P) and 85 μL of10 mM DxSO₄ in 1×SB17, 0.05% Tween-20 was added to each well of thefilter plates.

The filter plates were removed from the deck, placed onto a VariomagThermoshaker (Thermo Fisher Scientific, Inc., Waltham, Mass.) under theBlackRay (Ted Pella, Inc., Redding, Calif.) light sources, andirradiated for 10 minutes while shaking at 800 rpm.

The photocleaved solutions were sequentially eluted from each Catch 1plate into a common deep well plate by first placing the 10% Catch 1filter plate on top of a 1 mL deep-well plate and centrifuging at 1000rpm for 1 minute. The 1% and 0.03% catch 1 plates were then sequentiallycentrifuged into the same deep well plate.

10. Catch 2 Bead Capture

The 1 mL deep well block containing the combined eluates of catch 1 wasplaced on the deck of the Beckman Biomek Fx^(P) for catch 2.

The robot transferred all of the photo-cleaved eluate from the 1 mLdeep-well plate onto the Hybaid plate containing the previously preparedcatch 2 MyOne magnetic beads (after removal of the MyOne buffer viamagnetic separation).

The solution was incubated while shaking at 1350 rpm for 5 minutes at25° C. on a Variomag Thermoshaker (Thermo Fisher Scientific, Inc.,Waltham, Mass.).

The robot transferred the plate to the on deck magnetic separatorstation. The plate was incubated on the magnet for 90 seconds beforeremoval and discarding of the supernatant.

11. 37° C. 30% Glycerol Washes

The catch 2 plate was moved to the on-deck thermal shaker and 75 μL of1×SB17, 0.05% Tween-20 was transferred to each well. The plate was mixedfor 1 minute at 1350 rpm and 37° C. to resuspend and warm the beads. Toeach well of the catch 2 plate, 75 μL of 60% glycerol at 37° C. wastransferred and the plate continued to mix for another minute at 1350rpm and 37° C. The robot transferred the plate to the 37° C. magneticseparator where it was incubated on the magnet for 2 minutes and thenthe robot removed and discarded the supernatant. These washes wererepeated two more times.

After removal of the third 30% glycerol wash from the catch 2 beads, 150μL of 1×SB17, 0.05% Tween-20 was added to each well and incubated at 37°C., shaking at 1350 rpm for 1 minute, before removal by magneticseparation on the 37° C. magnet.

The catch 2 beads were washed a final time using 150 μL 1×SB19, 0.05%Tween-20 with incubation for 1 minute while shaking at 1350 rpm, priorto magnetic separation.

12. Catch 2 Bead Elution and Neutralization

The aptamers were eluted from catch 2 beads by adding 105 μL of 100 mMCAPSO with 1M NaCl, 0.05% Tween-20 to each well. The beads wereincubated with this solution with shaking at 1300 rpm for 5 minutes.

The catch 2 plate was then placed onto the magnetic separator for 90seconds prior to transferring 90 μL of the eluate to a new 96-well platecontaining 10 μL of 500 mM HCl, 500 mM HEPES, 0.05% Tween-20 in eachwell. After transfer, the solution was mixed robotically by pipetting 90μL up and down five times.

13. Hybridization

The Beckman Biomek Fx^(P) transferred 20 μL of the neutralized catch 2eluate to a fresh Hybaid plate, and 5 μL of 10× Agilent Block,containing a 10× spike of hybridization controls, was added to eachwell. Next, 25 μL of 2× Agilent Hybridization buffer was manuallypipetted to the each well of the plate containing the neutralizedsamples and blocking buffer and the solution was mixed by manuallypipetting 25 μL up and down 15 times slowly to avoid extensive bubbleformation. The plate was spun at 1000 rpm for 1 minute.

A gasket slide was placed into an Agilent hybridization chamber and 40μL of each of the samples containing hybridization and blocking solutionwas manually pipetted into each gasket. An 8-channel variable spanningpipettor was used in a manner intended to minimize bubble formation.Custom Agilent microarray slides (Agilent Technologies, Inc., SantaClara, Calif.), with their Number Barcode facing up, were then slowlylowered onto the gasket slides (see Agilent manual for detaileddescription).

The top of the hybridization chambers were placed onto the slide/backingsandwich and clamping brackets slid over the whole assembly. Theseassemblies were tightly clamped by turning the screws securely.

Each slide/backing slide sandwich was visually inspected to assure thesolution bubble could move freely within the sample. If the bubble didnot move freely the hybridization chamber assembly was gently tapped todisengage bubbles lodged near the gasket.

The assembled hybridization chambers were incubated in an Agilenthybridization oven for 19 hours at 60° C. rotating at 20 rpm.

14. Post Hybridization Washing

Approximately 400 mL Agilent Wash Buffer 1 was placed into each of twoseparate glass staining dishes. One of the staining dishes was placed ona magnetic stir plate and a slide rack and stir bar were placed into thebuffer.

A staining dish for Agilent Wash 2 was prepared by placing a stir barinto an empty glass staining dish.

A fourth glass staining dish was set aside for the final acetonitrilewash.

Each of six hybridization chambers was disassembled. One-by-one, theslide/backing sandwich was removed from its hybridization chamber andsubmerged into the staining dish containing Wash 1. The slide/backingsandwich was pried apart using a pair of tweezers, while stillsubmerging the microarray slide. The slide was quickly transferred intothe slide rack in the Wash 1 staining dish on the magnetic stir plate.

The slide rack was gently raised and lowered 5 times. The magneticstirrer was turned on at a low setting and the slides incubated for 5minutes.

When one minute was remaining for Wash 1, Wash Buffer 2 pre-warmed to37° C. in an incubator was added to the second prepared staining dish.The slide rack was quickly transferred to Wash Buffer 2 and any excessbuffer on the bottom of the rack was removed by scraping it on the topof the stain dish. The slide rack was gently raised and lowered 5 times.The magnetic stirrer was turned on at a low setting and the slidesincubated for 5 minutes.

The slide rack was slowly pulled out of Wash 2, taking approximately 15seconds to remove the slides from the solution.

With one minute remaining in Wash 2 acetonitrile (ACN) was added to thefourth staining dish. The slide rack was transferred to the acetonitrilestain dish. The slide rack was gently raised and lowered 5 times. Themagnetic stirrer was turned on at a low setting and the slides incubatedfor 5 minutes.

The slide rack was slowly pulled out of the ACN stain dish and placed onan absorbent towel. The bottom edges of the slides were quickly driedand the slide was placed into a clean slide box.

15. Microarray Imaging

The microarray slides were placed into Agilent scanner slide holders andloaded into the Agilent Microarray scanner according to themanufacturer's instructions.

The slides were imaged in the Cy3-channel at 5 μm resolution at the 100%PMT setting and the XRD option enabled at 0.05. The resulting tiffimages were processed using Agilent feature extraction software version10.5.

Example 2 Biomarker Identification

The identification of potential lung cancer biomarkers was performed forthree different diagnostic applications, diagnosis of suspicious nodulesfrom a CT scan, screening of asymptomatic smokers for lung cancer, anddiagnosing an individual with lung cancer. Serum samples were collectedfrom four different sites in support of these three applications andinclude 48 NSCLC cases, 218 high risk controls composed of heavy smokersand patients with benign nodules. The multiplexed aptamer affinity assayas described in Example 1 was used to measure and report the RFU valuefor 820 analytes in each of these 264 samples. The KS-test was thenapplied to each analyte. The KS-distance (Kolmogorov-Smirnov statistic)between values from two sets of samples is a non parametric measurementof the extent to which the empirical distribution of the values from oneset (Set A) differs from the distribution of values from the other set(Set B). For any value of a threshold T some proportion of the valuesfrom Set A will be less than T, and some proportion of the values fromSet B will be less than T. The KS-distance measures the maximum(unsigned) difference between the proportion of the values from the twosets for any choice of T.

Sets of biomarkers can be used to build classifiers that assign samplesto either a control or disease group. In fact, many such classifierswere produced from these sets of biomarkers and the frequency with whichany biomarker was used in good scoring classifiers determined. Thosebiomarkers that occurred most frequently among the top scoringclassifiers were the most useful for creating a diagnostic test. In thisexample, Bayesian classifiers were used to explore the classificationspace but many other supervised learning techniques may be employed forthis purpose. The scoring fitness of any individual classifier wasgauged using the area under the receiver operating characteristic curve(AUC of ROC) of the classifier at the Bayesian surface assuming adisease prevalence of 0.5. This scoring metric varies from zero to one,with one being an error-free classifier. The details of constructing aBayesian classifier from biomarker population measurements are describedin Example 3.

Example 3 Naïve Bayesian Classification for Lung Cancer

From the list of biomarkers identified as useful for discriminatingbetween NSCLC and the high risk control group, a panel of fivebiomarkers was selected and a naïve Bayes classifier was constructed,see Table 14. The class-dependent probability density functions (pdfs),p(x_(i)|c) and p(x_(i)|d), where x_(i) is the log of the measured RFUvalue for biomarker i, and c and d refer to the control and diseasepopulations, were modeled as normal distribution functions characterizedby a mean μ and variance σ². The parameters for pdfs of the fivebiomarkers are listed in Table 15 and an example of the raw data alongwith the model fit to a normal pdf is displayed in FIG. 5. Theunderlying assumption appears to fit the data quite well as evidenced byFIG. 5.

The naïve Bayes classification for such a model is given by thefollowing equation, where P(d) is the prevalence of the disease in thepopulation

${\ln \; \frac{p\left( c \middle| \underset{\sim}{x} \right)}{p\left( d \middle| \underset{\sim}{x} \right)}} = {{\sum\limits_{i = 1}^{n}\left( {{\ln \; \frac{\sigma_{d,i}}{\sigma_{c,i}}} - {\frac{1}{2}\left\lbrack {\left( \frac{x_{i} - \mu_{c,i}}{\sigma_{c,i}} \right)^{2} - \left( \frac{x_{i} - \mu_{d,i}}{\sigma_{d,i}} \right)^{2}} \right\rbrack}} \right)} + {\ln \; \frac{\left( {1 - {P(d)}} \right)}{P(d)}}}$

appropriate to the test and n=5 here. Each of the terms in the summationis a log-likelihood ratio for an individual marker and the totallog-likelihood ratio of a sample {tilde under (x)} being free from thedisease of interest (i.e. in this case, NSCLC) versus having the diseaseis simply the sum of these individual terms plus a term that accountsfor the prevalence of the disease. For simplicity, we assume P(d)=0.5 sothat

${\ln \; \frac{\left( {1 - {P(d)}} \right)}{P(d)}} = 0.$

Given an unknown sample measurement in log(RFU) for each of the tenbiomarkers of. The individual components comprising the log likelihoodratio for control versus disease class are tabulated and can be computedfrom the parameters in Table 15 and the values of {tilde under (x)}. Thesum of the individual log likelihood ratios is 3.47, or a likelihood ofbeing free from the disease versus having the disease of 32:1, wherelikelihood=e^(3.47)=32. All five biomarkers are all consistently foundto favor the control group. Multiplying the likelihoods together givesthe same results as that shown above; a likelihood of 32:1 that theunknown sample is free from the disease. In fact, this sample came fromthe control population in the training set. Although this exampledemonstrates the classification of serum samples using the biomarkers inTable 15, the same approach can be used in any tissue type with any setof biomarkers from Table 21.

Example 4 Greedy Algorithm for Selecting Biomarker Panels forClassifiers Part 1

This example describes the selection of biomarkers from Table 21 to formpanels that can be used as classifiers in any of the methods describedherein. Panels of biomarkers containing MMP-12 and Subsets of thebiomarkers in Table 21 were selected to construct classifiers with goodperformance. This method was also used to determine which potentialmarkers were included as biomarkers in Example 2.

The measure of classifier performance used here is the area under theROC curve (AUC); a performance of 0.5 is the baseline expectation for arandom (coin toss) classifier, a classifier worse than random wouldscore between 0.0 and 0.5, a classifier with better than randomperformance would score between 0.5 and 1.0. A perfect classifier withno errors would have a sensitivity of 1.0, a specificity of 1.0 and anAUC of 1.0. One can apply the methods described in Example 4 to othercommon measures of performance such as the F-measure, the sum ofsensitivity and specificity, or the product of sensitivity andspecificity. Specifically one might want to treat specificity andspecificity with differing weight, so as to select those classifierswhich perform with higher specificity at the expense of somesensitivity, or to select those classifiers which perform with highersensitivity at the expense of some specificity. Since the methoddescribed here only involves a measure of “performance”, any weightingscheme which results in a single performance measure can be used.Different applications will have different benefits for true positiveand true negative findings, and also different costs associated withfalse positive findings from false negative findings. For example,screening asymptomatic smokers and the differential diagnosis of benignnodules found on CT will not in general have the same optimal trade-offbetween specificity and sensitivity. The different demands of the twotests will in general require setting different weighting to positiveand negative misclassifications, reflected in the performance measure.Changing the performance measure will in general change the exact subsetof markers selected from Table 21 for a given set of data.

For the Bayesian approach to the discrimination of lung cancer samplesfrom control samples described in Example 3, the classifier wascompletely parameterized by the distributions of biomarkers in thedisease and benign training samples, and the list of biomarkers waschosen from Table 21; that is to say, the subset of markers chosen forinclusion determined a classifier in a one-to-one manner given a set oftraining data.

The greedy method employed here was used to search for the optimalsubset of markers from Table 21. For small numbers of markers orclassifiers with relatively few markers, every possible subset ofmarkers was enumerated and evaluated in terms of the performance of theclassifier constructed with that particular set of markers (see Example4, Part 2). (This approach is well known in the field of statistics as“best subset selection”; see, e.g., The Elements of StatisticalLearning—Data Mining, Inference, and Prediction, T. Hastie, et al.,editors, Springer Science+Business Media, LLC, 2nd edition, 2009).However, for the classifiers described herein, the number ofcombinations of multiple markers can be very large, and it was notfeasible to evaluate every possible set of five markers, for example,from the list of 86 markers (Table 21) (i.e., 34,826,302 combinations).Because of the impracticality of searching through every subset ofmarkers, the single optimal subset may not be found; however, by usingthis approach, many excellent subsets were found, and, in many cases,any of these subsets may represent an optimal one.

Instead of evaluating every possible set of markers, a “greedy” forwardstepwise approach may be followed (see, e.g., Dabney A R, Storey J D(2007) Optimality Driven Nearest Centroid Classification from GenomicData. PLoS ONE 2(10): e1002. doi:10.1371/journal.pone.0001002). Usingthis method, a classifier is started with the best single marker (basedon KS-distance for the individual markers) and is grown at each step bytrying, in turn, each member of a marker list that is not currently amember of the set of markers in the classifier. The one marker whichscores best in combination with the existing classifier is added to theclassifier. This is repeated until no further improvement in performanceis achieved. Unfortunately, this approach may miss valuable combinationsof markers for which some of the individual markers are not all chosenbefore the process stops.

The greedy procedure used here was an elaboration of the precedingforward stepwise approach, in that, to broaden the search, rather thankeeping just a single candidate classifier (marker subset) at each step,a list of candidate classifiers was kept. The list was seeded with everysingle marker subset (using every marker in the table on its own). Thelist was expanded in steps by deriving new classifiers (marker subsets)from the ones currently on the list and adding them to the list. Eachmarker subset currently on the list was extended by adding any markerfrom Table 1 not already part of that classifier, and which would not,on its addition to the subset, duplicate an existing subset (these aretermed “permissible markers”). Every existing marker subset was extendedby every permissible marker from the list. Clearly, such a process wouldeventually generate every possible subset, and the list would run out ofspace. Therefore, all the generated classifiers were kept only while thelist was less than some predetermined size (often enough to hold allthree marker subsets). Once the list reached the predetermined sizelimit, it became elitist; that is, only those classifiers which showed acertain level of performance were kept on the list, and the others felloff the end of the list and were lost. This was achieved by keeping thelist sorted in order of classifier performance; new classifiers whichwere at least as good as the worst classifier currently on the list wereinserted, forcing the expulsion of the current bottom underachiever. Onefurther implementation detail is that the list was completely replacedon each generational step; therefore, every classifier on the list hadthe same number of markers, and at each step the number of markers perclassifier grew by one.

Since this method produced a list of candidate classifiers usingdifferent combinations of markers, one may ask if the classifiers can becombined in order to avoid errors which might be made by the best singleclassifier, or by minority groups of the best classifiers. Such“ensemble” and “committee of experts” methods are well known in thefields of statistical and machine learning and include, for example,“averaging”, “voting”, “stacking”, “bagging” and “boosting” (see, e.g.,The Elements of Statistical Learning—Data Mining, Inference, andPrediction, T. Hastie, et al., editors, Springer Science+Business Media,LLC, 2nd edition, 2009). These combinations of simple classifiersprovide a method for reducing the variance in the classifications due tonoise in any particular set of markers by including several differentclassifiers and therefore information from a larger set of the markersfrom the biomarker table, effectively averaging between the classifiers.An example of the usefulness of this approach is that it can preventoutliers in a single marker from adversely affecting the classificationof a single sample. The requirement to measure a larger number ofsignals may be impractical in conventional “one marker at a time”antibody assays but has no downside for a fully multiplexed aptamerassay. Techniques such as these benefit from a more extensive table ofbiomarkers and use the multiple sources of information concerning thedisease processes to provide a more robust classification.

Part 2

The biomarkers selected in Table 1 gave rise to classifiers whichperform better than classifiers built with “non-markers” (i.e., proteinshaving signals that did not meet the criteria for inclusion in Table 1(as described in Example 2)).

For classifiers containing only one, two, and three markers, allpossible classifiers obtained using the biomarkers in Table 1 wereenumerated and examined for the distribution of performance compared toclassifiers built from a similar table of randomly selected non-markerssignals.

In FIG. 17 and FIG. 18, the sum of the sensitivity and specificity wasused as the measure of performance; a performance of 1.0 is the baselineexpectation for a random (coin toss) classifier. The histogram ofclassifier performance was compared with the histogram of performancefrom a similar exhaustive enumeration of classifiers built from a“non-marker” table of 40 non-marker signals; the 40 signals wererandomly chosen from 400 aptamers that did not demonstrate differentialsignaling between control and disease populations (KS-distance<1.4).

FIG. 17 shows histograms of the performance of all possible one, two,and three-marker classifiers built from the biomarker parameters inTable 13 for biomarkers that can discriminate between benign nodules andNSCLC and compares these classifiers with all possible one, two, andthree-marker classifiers built using the 40 “non-marker” aptamer RFUsignals. FIG. 17A shows the histograms of single marker classifierperformance, FIG. 17B shows the histogram of two marker classifierperformance, and FIG. 17C shows the histogram of three marker classifierperformance.

In FIG. 17, the solid lines represent the histograms of the classifierperformance of all one, two, and three-marker classifiers using thebiomarker data for benign nodules and NSCLC in Table 13. The dottedlines are the histograms of the classifier performance of all one, two,and three-marker classifiers using the data for benign nodules and NSCLCbut using the set of random non-marker signals.

FIG. 18 shows histograms of the performance of all possible one, two,and three-marker classifiers built from the biomarker parameters inTable 12 for biomarkers that can discriminate between asymptomaticsmokers and NSCLC and compares these with all possible one, two, andthree-marker classifiers built using 40 “non-marker” aptamer RFUsignals. FIG. 18A shows the histograms of single marker classifierperformance, FIG. 18B shows the histogram of two marker classifierperformance, and FIG. 18C shows the histogram of three marker classifierperformance.

In FIG. 18, the solid lines represent the histograms of the classifierperformance of all one, two, and three-marker classifiers using thebiomarker parameters for asymptomatic smokers and NSCLC in Table 12. Thedotted lines are the histograms of the classifier performance of allone, two, and three-marker classifiers using the data for asymptomaticsmokers and NSCLC but using the set of random non-marker signals.

The classifiers built from the markers listed in Table 1 form a distincthistogram, well separated from the classifiers built with signals fromthe “non-markers” for all one-marker, two-marker, and three-markercomparisons. The performance and AUC score of the classifiers built fromthe biomarkers in Table 1 also increase faster with the number ofmarkers than do the classifiers built from the non-markers, theseparation increases between the marker and non-marker classifiers asthe number of markers per classifier increases. All classifiers builtusing the biomarkers listed in Tables 38 and 39 perform distinctlybetter than classifiers built using the “non-markers”.

Part 3

To test whether a core subset of markers accounted for the goodperformance of the classifiers, half of the markers were randomlydropped from the lists of biomarkers in Tables 38 and 39. Theperformance, as measured by sensitivity plus specificity, of classifiersfor distinguishing benign nodules from malignant nodules droppedslightly by 0.07 (from 1.74 to 1.67), and the performance of classifiersfor distinguishing smokers who had cancer from those who did not alsodropped slightly by 0.06 (from 1.76 to 1.70). The implication of theperformance characteristics of subsets of the biomarker table is thatmultiple subsets of the listed biomarkers are effective in building adiagnostic test, and no particular core subset of markers dictatesclassifier performance.

In the light of these results, classifiers that excluded the bestmarkers from Tables 12 and 13 were tested. FIG. 19 compares theperformance of classifiers built with the full list of biomarkers inTables 12 and 13 with the performance of classifiers built with a set ofbiomarkers from Tables 38 and 39 excluding top ranked markers.

FIG. 19 demonstrates that classifiers constructed without the bestmarkers perform well, implying that the performance of the classifierswas not due to some small core group of markers and that the changes inthe underlying processes associated with disease are reflected in theactivities of many proteins. Many subsets of the biomarkers in Table 1performed close to optimally, even after removing the top 15 of the 40markers from Table 1.

FIG. 19A shows the effect on classifiers for discriminating benignnodules from NSCLC built with 2 to 10 markers. Even after dropping the15 top-ranked markers (ranked by KS-distance) from Table 13, the benignnodule vs. NSCLC performance increased with the number of markersselected from the table to reach over 1.65 (Sensitivity+Specificity).

FIG. 19B shows the effect on classifiers for discriminating asymptomaticsmokers from NSCLC built with 2 to 10 markers. Even after dropping the15 top-ranked markers (ranked by KS-distance) from Table 12, theasymptomatic smokers vs. NSCLC performance increased with the number ofmarkers selected from the table to reach over 1.7(Sensitivity+Specificity), and closely approached the performance of thebest classifier selected from the full list of biomarkers in Table 12.

Finally, FIG. 20 shows how the ROC performance of typical classifiersconstructed from the list of parameters in Tables 12 and 13 according toExample 3. FIG. 20A shows the model performance from assuming theindependence of markers as in Example 3, and FIG. 20B shows the actualROC curves using the assay data set used to generate the parameters inTables 12 and 13. It can be seen that the performance for a given numberof selected markers was qualitatively in agreement, and thatquantitative agreement degraded as the number of markers increases.(This is consistent with the notion that the information contributed byany particular biomarker concerning the disease processes is redundantwith the information contributed by other biomarkers provided in Tables12 and 13). FIG. 20 thus demonstrates that Tables 12 and 13 incombination with the methods described in Example 3 enable theconstruction and evaluation of a great many classifiers useful for thediscrimination of NSCLC from benign nodules and the discrimination ofasymptomatic smokers who have NSCLC from those who do not have NSCLC.

Example 5 Aptamer Specificity Demonstration in a Pull-Down Assay

The final readout on the multiplex assay is based on the amount ofaptamer recovered after the successive capture steps in the assay. Themultiplex assay is based on the premise that the amount of aptamerrecovered at the end of the assay is proportional to the amount ofprotein in the original complex mixture (e.g., plasma). In order todemonstrate that this signal is indeed derived from the intended analyterather than from non-specifically bound proteins in plasma, we developeda gel-based pull-down assay in plasma. This assay can be used tovisually demonstrate that a desired protein is in fact pulled out fromplasma after equilibration with an aptamer as well as to demonstratethat aptamers bound to their intended protein targets can survive as acomplex through the kinetic challenge steps in the assay. In theexperiments described in this example, recovery of protein at the end ofthis pull-down assay requires that the protein remain non-covalentlybound to the aptamer for nearly two hours after equilibration.Importantly, in this example we also provide evidence thatnon-specifically bound proteins dissociate during these steps and do notcontribute significantly to the final signal. It should be noted thatthe pull-down procedure described in this example includes all of thekey steps in the multiplex assay described above.

Plasma Pull-Down Assay

Plasma samples were prepared by diluting 50 μL EDTA-plasma to 100 μL inSB18 with 0.05% Tween-20 (SB18T) and 2 μM Z-Block. The plasma solutionwas equilibrated with 10 pmoles of a PBDC-aptamer in a final volume of150 μL for 2 hours at 37° C. After equilibration, complexes and unboundaptamer were captured with 133 μL of a 7.5% Streptavidin-agarose beadslurry by incubating with shaking for 5 minutes at RT in a Duraporefilter plate. The samples bound to beads were washed with biotin andwith buffer under vacuum as described in Example 1. After washing, boundproteins were labeled with 0.5 mM NHS-S-S-biotin, 0.25 mM NHS-Alexa647in the biotin diluent for 5 minutes with shaking at RT. This stainingstep allows biotinylation for capture of protein on streptavidin beadsas well as highly sensitive staining for detection on a gel. The sampleswere washed with glycine and with buffer as described in Example 1.Aptamers were released from the beads by photocleavage using a Black Raylight source for 10 minutes with shaking at RT. At this point, thebiotinylated proteins were captured on 0.5 mg MyOne Streptavidin beadsby shaking for 5 minutes at RT. This step will capture proteins bound toaptamers as well as proteins that may have dissociated from aptamerssince the initial equilibration. The beads were washed as described inExample 1. Proteins were eluted from the MyOne Streptavidin beads byincubating with 50 mM DTT in SB17T for 25 minutes at 37° C. withshaking. The eluate was then transferred to MyOne beads coated with asequence complimentary to the 3′ fixed region of the aptamer andincubated for 25 minutes at 37° C. with shaking. This step captures allof the remaining aptamer. The beads were washed 2× with 100 μL SB17T for1 minute and 1× with 100 μL SB19T for 1 minute. Aptamer was eluted fromthese final beads by incubating with 45 μL 20 mM NaOH for 2 minutes withshaking to disrupt the hybridized strands. 40 μL of this eluate wasneutralized with 10 μL 80 mM HCl containing 0.05% Tween-20. Aliquotsrepresenting 5% of the eluate from the first set of beads (representingall plasma proteins bound to the aptamer) and 20% of the eluate from thefinal set of beads (representing all plasma proteins remaining bound atthe end of our clinical assay) were run on a NuPAGE 4-12% Bis-Tris gel(Invitrogen) under reducing and denaturing conditions. Gels were imagedon an Alpha Innotech FluorChem Q scanner in the Cy5 channel to image theproteins.

Pull-down gels for aptamers were selected against LBP (˜1×10⁻⁷ M inplasma, polypeptide MW ˜60 kDa), C9 (˜1×10⁻⁶ M in plasma, polypeptide MW˜60 kDa), and IgM (˜9×10⁻⁶ M in plasma, MW ˜70 kDa and 23 kDa),respectively. (See FIG. 16).

For each gel, lane 1 is the eluate from the Streptavidin-agarose beads,lane 2 is the final eluate, and lane 3 is a MW marker lane (major bandsare at 110, 50, 30, 15, and 3.5 kDa from top to bottom). It is evidentfrom these gels that there is a small amount non-specific binding ofplasma proteins in the initial equilibration, but only the targetremains after performing the capture steps of the assay. It is clearthat the single aptamer reagent is sufficient to capture its intendedanalyte with no up-front depletion or fractionation of the plasma. Theamount of remaining aptamer after these steps is then proportional tothe amount of the analyte in the initial sample.

Example 6 Analysis of NSCLC Surgical Resections

To demonstrate the utility of the platform based technology describedherein to identify disease-related biomarkers from tissues, homogenizedtissues samples from surgical resections obtained from eight NSCLCpatients were analyzed, All NSCLC patients were smokers, ranging in agefrom 47 to 75 years old and covering NSCLC stages 1A through 3B (Table17). All tissue samples were obtained by freezing the tissue within 5-10minutes of excision during surgery and after placing the tissues in OCTmedium (10.24% polyvinyl alcohol, 4.26% polyethylene glycol, and 85.5%non-reactive ingredients). Three samples were obtained from eachresection: tumor tissue sample, adjacent healthy tissue (within 1 cm ofthe tumor) and distant uninvolved lung tissue. While keeping the samplesconstantly frozen, five 10 μm thick sections were cut, trimmed of excessOCT from around the tissue, and placed into frozen 1.5 mL microfugetubes. Following the addition of 200 μL homogenization buffer (SB18buffer plus PI cocktail (Pierce HALT protease inhibitor cocktail withoutmagnesium), the samples were homogenized in the microfuge tubes on icewith rotary pestle for 30 seconds, until no tissue fragments werevisible. The samples were then spun in a centrifuge at 21,000 g for 10minutes and filtered through a 0.2 μm multiwell plate filter into asterile multiwell plate. Five μL aliquots were taken for BCA proteinassay and the rest of the sample was stored frozen and sealed in 96 wellplates at −70° C.

Sample total protein was adjusted to 16 μg/mL in SB17T buffer (SB17buffer containing 0.05% tween 20) for proteomic profiling. Samplesprepared in this manner were run on the multiplex aptamer assay which,as noted above, measures over 800 proteins as described previously(Ostroff et al., Nature Precedings,http://precedings.nature.com/documents/4537/version/1 (2010)). Among themeasured analytes, most were unchanged between tumor, adjacent tissueand distal tissue. However, some proteins were clearly suppressed (FIG.24) while others were elevated substantially in tumor tissues (FIG. 23)compared to adjacent and distal tissues.

The foregoing embodiments and examples are intended only as examples. Noparticular embodiment, example, or element of a particular embodiment orexample is to be construed as a critical, required, or essential elementor feature of any of the claims. Further, no element described herein isrequired for the practice of the appended claims unless expresslydescribed as “essential” or “critical.” Various alterations,modifications, substitutions, and other variations can be made to thedisclosed embodiments without departing from the scope of the presentapplication, which is defined by the appended claims. The specification,including the figures and examples, is to be regarded in an illustrativemanner, rather than a restrictive one, and all such modifications andsubstitutions are intended to be included within the scope of theapplication. Accordingly, the scope of the application should bedetermined by the appended claims and their legal equivalents, ratherthan by the examples given above. For example, steps recited in any ofthe method or process claims may be executed in any feasible order andare not limited to an order presented in any of the embodiments, theexamples, or the claims. Further, in any of the aforementioned methods,one or more biomarkers of Table 18, Table 20, or Table 21 can bespecifically excluded either as an individual biomarker or as abiomarker from any panel.

TABLE 1 Lung Cancer Biomarkers Column #4 Column #5 Gene Benign Column #6Column #2 Designation Nodule Smokers Column #1 Biomarker Column #3(Entrez versus versus Biomarker # Designation Alternate Protein NamesGene Link) NSCLC NSCLC 1 AMPM2 Methionine aminopeptidase 2 METAP2 Xp67eIF2 p67 Initiation factor 2-associated 67 kDa glycoprotein PeptidaseM 2 MetAP 2 MAP 2 2 Apo A-I apolipoprotein A-I APOA1 X ApolipoproteinA-1 3 b-ECGF FGF acidic FGF1 X FGF1 beta-ECGF Beta-endothelial cellgrowth factor 4 BLC BLC B lymphocyte chemoattractant CXCL13 X X Smallinducible cytokine B13 CXCL13 BCA-1 5 BMP-1 Bone morphogenetic protein 1BMP1 X X Procollagen C-proteinase PCP Mammalian tolloid protein mTld 6BTK Tyrosine-protein kinase BTK BTK X Bruton tyrosine kinaseAgammaglobulinaemia tyrosine kinase ATK B-cell progenitor kinase 7 C1sComplement C1s subcomponent C1S X C1s, Activated, Two-Chain Form 8 C9Complement component C9 C9 X X 9 Cadherin E Cadherin-1 CDH1 X Epithelialcadherin E-cadherin Uvomorulin CAM 120/80 CD_antigen = CD324 10Cadherin-6 Kidney-cadherin CDH6 X K-cadherin 11 Calpain I Calpain I(dimer of Calpain-1 CAPN1 X catalytic subunit and Calpain small CAPNS1subunit 1) synonyms of the catalytic subunit include Calpain-1 largesubunit: Calcium-activated neutral proteinase 1 Micromolar-calpain Cellproliferation-inducing gene 30 protein synonyms of the small subunitinclude: Calcium-dependent protease small subunit 1 Calcium-activatedneutral proteinase small subunit CANP small subunit 12 Catalase CatalaseCAT X 13 CATC Dipeptidyl-peptidase 1 precursor CTSC XDipeptidyl-peptidase I DPP-I DPPI Cathepsin C Cathepsin J Dipeptidyltransferase 14 Cathepsin H Cathepsin H CTSH X 15 CD30 Ligand Tumornecrosis factor ligand TNFSF8 X X superfamily member 8 CD30-L CD153antigen 16 CDK5-p35 CDK5/p35 is a dimer of Cell division CDK5 X proteinkinase 5, and the p35 chain CDK5R1 of Cyclin-dependent kinase 5activator 1 Cell division protein kinase 5 is also known as:Cyclin-dependent kinase 5 Tau protein kinase II catalytic subunitSerine/threonine-protein kinase PSSALRE p35 chain of Cyclin-dependentkinase 5 activator 1 is also known as: Cyclin-dependent kinase 5regulatory subunit 1 CDK5 activator 1 Cyclin-dependent kinase 5regulatory subunit 1 Tau protein kinase II regulatory subunit. 17 CK-MBCreatine Phosphokinase-MB CKB X X Isoenzyme, which is a dimer of CKMCreatine kinase M-type and B-type Creatine kinase M and B chains M-CKand B-CK CKM and CKB 18 CNDP1 Beta-Ala-His dipeptidase CNDP1 X XCarnosine dipeptidase 1 CNDP dipeptidase 1 Serum carnosinase Glutamatecarboxypeptidase-like protein 2 19 Contactin-5 Neural recognitionmolecule NB-2 CNTN5 X hNB-2 20 CSK Tyrosine-protein kinase CSK CSK X XC-SRC kinase Protein-tyrosine kinase CYL 21 Cyclophilin A Cyclophilin APPIA X Peptidyl-prolyl cis-trans isomerase A PPIase Peptidylprolylisomerase Cyclosporin A-binding protein Rotamase A PPIase A 22Endostatin Endostatin, which is cleaved from COL18A1 X Collagen alpha-1(XVIII) chain 23 ERBB1 Epidermal growth factor receptor EGFR X XReceptor tyrosine-protein kinase ErbB-1 EGFR HER1 24 FGF-17 FibroblastGrowth Factor-17 FGF17 X X 25 FYN Proto-oncogene tyrosine-protein FYN Xkinase Fyn Protooncogene Syn p59-Fyn 26 GAPDH, liver Glyceraldehyde3-phosphate GAPDH X X dehydrogenase 27 HMG-1 High mobility group proteinB1 HMGB1 X amphoterin Neurite growth-promoting protein 28 HSP 90a Heatshock protein HSP 90-alpha HSP90AA1 X X HSP 86 Renal carcinoma antigenNY-REN- 38 29 HSP 90b Heat shock protein HSP 90-beta HSP90AB1 X HSP 90HSP 84 30 IGFBP-2 Insulin-like growth factor-binding IGFBP2 X X protein2 (IGF-binding protein 2; IGFBP-2; IBP-2; BP2) 31 IL-15 RaInterleukin-15 receptor subunit alpha IL15RA X 32 IL-17B Interleukin-17BIL17B X Neuronal interleukin-17 related factor Interleukin-20Cytokine-like protein ZCYTO7 33 IMB1 Importin subunit beta-1 KPNB1 XKaryopherin subunit beta-1 Nuclear factor P97 Importin-90 34 Kallikrein7 Kallikrein-7 KLK7 X hK7 Stratum corneum chymotryptic enzyme hSCCESerine protease 6 35 KPCI Protein kinase C iota type PRKCI X X nPKC-iotaAtypical protein kinase C- lambda/iota aPKC-lambda/iota PRKC-lambda/iota36 LDH-H 1 L-lactate dehydrogenase B chain LDHB X LDH-B LDH heartsubunit LDH-H Renal carcinoma antigen NY-REN- 46 37 LGMN Legumain LGMN XProtease, cysteine 1 Asparaginyl endopeptidase 38 LRIG3 Leucine-richrepeats and LRIG3 X X immunoglobulin-like domains protein 3 39Macrophage Macrophage mannose receptor 1 MRC1 X mannose MMR receptorC-type lectin domain family 13 member D CD_antigen = CD206 40 MEK1 Dualspecificity mitogen-activated MAP2K1 X X protein kinase kinase 1MAPK/ERK kinase 1 ERK activator kinase 1 41 METAP1 Methionineaminopeptidase 1 METAP1 X MetAP 1 MAP 1 Peptidase M1 42 Midkine Neuriteoutgrowth-promoting protein MDK X Neurite outgrowth-promoting factor 2Midgestation and kidney protein Amphiregulin-associated protein ARAP 43MIP-5 C-C motif chemokine 15 MIP5 X Small-inducible cytokine A15Macrophage inflammatory protein 5 Chemokine CC-2 HCC-2 NCC-3 MIP-1 deltaLeukotactin-1 LKN-1 Mrp-2b 44 MK13 Mitogen-activated protein kinase 13MAPK13 X MAP kinase p38 delta Mitogen-activated protein kinase p38 deltaStress-activated protein kinase 4 45 MMP-7 Matrilysin MMP7 X Pump-1protease Uterine metalloproteinase Matrix metalloproteinase-7 MMP-7Matrin 46 NACA Nascent polypeptide-associated NACA X complex subunitalpha NAC-alpha Alpha-NAC Allergen = Hom s 2 47 NAGK N-acetylglucosaminekinase NAGK X GlcNAc kinase 48 PARC C-C motif chemokine 18 CCL18 XSmall-inducible cytokine A18 Macrophage inflammatory protein 4 MIP-4Pulmonary and activation-regulated chemokine CC chemokine PARCAlternative macrophage activation- associated CC chemokine 1 AMAC-1Dendritic cell chemokine 1 DC-CK1 49 Proteinase-3 Proteinase-3 PRTN3 XPR-3 AGP7 P29 Myeloblastin Leukocyte proteinase 3 Wegener's autoantigenNeutrophil proteinase 4 NP4 C-ANCA antigen 50 Prothrombin Prothrombin F2X X (Coagulation factor II) 51 PTN Pleiotrophin PTN X Heparin-bindinggrowth-associated molecule HB-GAM Heparin-binding growth factor 8 HBGF-8Osteoblast-specific factor 1 OSF-1 Heparin-binding neurite outgrowth-promoting factor 1 HBNF-1 Heparin-binding brain mitogen HBBM 52 RAC1Ras-related C3 botulinum toxin RAC1 X substrate 1 p21-Rac1 Ras-likeprotein TC25 Cell migration-inducing gene 5 protein 53 Renin Renin REN XAngiotensinogenase 54 RGM-C Hemojuvelin HFE2 X Hemochromatosis type 2protein RGM domain family member C 55 SCF sR Mast/stem cell growthfactor KIT X X receptor (SCFR; Proto-oncogene tyrosine- protein kinaseKit; c-kit; CD_antigen = CD117) 56 sL-Selectin sL-Selectin SELL XLeukocyte adhesion molecule-1 Lymph node homing receptor LAM-1L-Selectin L-Selectin, soluble Leukocyte surface antigen Leu-8 TQ1gp90-MEL Leukocyte-endothelial cell adhesion molecule 1 LECAM1 CD62antigen-like family member L 57 TCTP Translationally-controlled tumorTPT1 X protein p23 Histamine-releasing factor HRF Fortilin 58 UBE2NUbiquitin-conjugating enzyme E2 N UBE2N X Ubiquitin-protein ligase NUbiquitin carrier protein N Ubc13 Bendless-like ubiquitin-conjugatingenzyme 59 Ubiquitin+1 Ubiquitin RPS27A X 60 VEGF Vascular endothelialgrowth factor A VEGFA X VEGF-A Vascular permeability factor 61 YESProto-oncogene tyrosine-protein YES X kinase Yes c-Yes p61-Yes

TABLE 2 Aptamer Concentrations Final Aptamer Target Conc (nM) AMPM2 0.5Apo A-I 0.25 b-ECGF 2 BLC 0.25 BMP-1 1 BTK 0.25 C1s 0.25 C9 1 Cadherin E0.25 Cadherin-6 0.5 Calpain I 0.5 Catalase 0.5 CATC 0.5 Cathepsin H 0.5CD30 Ligand 0.5 CDK5/p35 0.5 CK-MB 1 CNDP1 0.5 Contactin-5 1 CSKCyclophilin A 0.5 Endostatin 1 ERBB1 0.5 FYN 0.25 GAPDH, liver 0.25HMG-1 0.5 HSP 90a 0.5 HSP 90b 0.5 IGFBP-2 1 IL-15 Ra 0.5 IL-17B 0.5 IMB11 Kallikrein 7 0.5 KPCI 0.25 LDH-H 1 0.5 LGMN 0.5 LRIG3 0.25 Macrophage2 mannose receptor MEK1 0.5 METAP1 0.25 Midkine 0.5 MIP-5 1 MK13 1 MMP-70.25 NACA 0.5 NAGK 0.5 PARC 0.5 Proteinase-3 1 Prothrombin 0.5 PTN 0.25RAC1 0.5 Renin 0.25 RGM-C 0.5 SCF sR 1 sL-Selectin 0.5 TCTP 0.5 UBE2N0.5 Ubiquitin+1 0.5 VEGF 1 YES 0.5

TABLE 3 Benign Asymptomatic Site NSCLC Nodule Smokers 1 32 0 47 2 63 176128 3 70 195 94 4 54 49 83 Sum 213 420 352 Males 51% 46% 49% Females 49%54% 51% Median 68 60 57 Age Median 40 42 34 Pack Years Median 1.94 2.432.58 FEV1 Median 74 88 90 FEV 1% Median 70 72 73 FEV1/FVC

TABLE 4 Biomarkers Identified in Benign Nodule-NSCLC in Aggregated DataSCF sR CNDP1 Stress-induced- phosphoprotein 1 RGM-C MEK1 LRIG3 ERBB1MDHC ERK-1 Cadherin E Catalase Cyclophilin A CK-MB BMP-1 Caspase-3METAP1 ART UFM1 HSP90a C9 RAC1 IGFBP-2 TCPTP Peroxiredoxin-1 Calpain IRPS6KA3 PAFAHbeta subunit KPCI IMB1 MK01 MMP-7 UBC9 Integrina1b1 β-ECGFUbiquitin+1 IDE HSP90b Cathepsin H CAMK2A NAGK CSK21 BLC FGF-17 BTKBARK1 Macrophage Thrombin eIF-5 mannose receptor MK13 LYN UFC1 NACAHSP70 RS7 GAPDH UBE2N PRKACA CSK TCTP AMPM2 Activin A RabGDPdissociationStress-induced- inhibitor beta phosphoprotein 1 Prothrombin MAPKAPK3

TABLE 5 Biomarkers Identified in Smoker-NSCLC in Aggregated Data SCF sRRenin Caspase-3 PTN CSK AMPM2 HSP90a Contactin-5 RS7 Kallikrein 7 UBE2NOCAD1 LRIG3 MPIF-1 HSP70 IGFBP-2 PRKACA GSK-3alpha PARC granzymeA FSTL3CD30 Ligand Ubiquitin+1 PAFAH beta subunit Prothrombin NAGK Integrina1b1 ERBB1 Cathepsin S ERK-1 KPCI TCTP CSK21 BTK UBC9 CATC GAPDH, liverMK13 MK01 CK-MB Cystatin C pTEN LDH-H1 RPS6KA3 b2-Microglobulin CNDP1IL-15Ra UFM1 RAC1 Calpain I UFC1 C9 MAPKAPK3 Peroxiredoxin-1 FGF-17 IMB1PKB Endostatin BARK1 IDE Cyclophilin A Cathepsin H HSP90b C1s MacrophageBGH3 mannose receptor CD30 Dtk BLC BMP-1 NACA XPNPEP1 SBDSRabGDPdissociation TNFsR-I inhibitor beta MIP-5 LYN DUS3 CCL28 METAP1MMP-7 MK12

TABLE 6 Biomarkers Identified in Benign Nodule-NSCLC by Site ERBB1FGF-17 LRIG3 CD30Ligand HMG-1 LGMN YES Proteinase-3 C9 MEK1 MK13 BLCMacrophage mannose receptor IL-17B ApoA-I CATC CNDP1 Cadherin-6 BMP-1

TABLE 7 Biomarkers Identified in Smoker-NSCLC by Site Kallikrein 7 CSKAzurocidin SCF sR FYN b2-Microglobulin ERBB1 BLC OCAD1 C9 TCTP LGMNLRIG3 Midkine PKB AMPM2 FGF-17 XPNPEP1 HSP90a MEK1 Cadherin-6sL-Selectin BMP-1 pTEN BTK LYN LYNB CNDP1 Integrin a1b1 DUS3 CDK5-p35PKB gamma Carbonic anhydrase XIII

TABLE 8 Biomarkers Identified in Benign Nodule-NSCLC in Blended Data SetYES Catalase PAFAH beta eIF-5 subunit MK13 Prothrombin AMPM2 TNFsR-ILRIG3 BTK TCPTP BLC HMG-1 DRG-1 BGH3 MAPKAPK3 ERBB1 UBE2N Ubiquitin+1b2-Microglobulin Cadherin E Activin A BARK1 SOD CK-MB TCTP LYN GSK-3alpha C9 UBC9 PRKACA Fibrinogen SCFsR NAGK LGMN ERK-1 CNDP1 Calpain IIntegrin a1b1 Cadherin-6 RGM-C GAPDH HSP70 IDE METAP1 UFM1 XPNPEP1 UFC1Macrophage Caspase-3 Stress-induced- PSA-ACT mannose receptorphosphoprotein1 BMP-1 b-ECGF RPS6KA3 CATC KPCI RAC1 SHP-2 pTEN IGFBP-2MDHC CEA PSA CSK Proteinase-3 OCAD1 CATE NACA MK01 Cyclophilin APeroxiredoxin-1 IMB1 MEK1 RabGDP SBDS dissociation inhibitor betaCathepsin H HSP90a DUS3 RS7 MMP-7 Thrombin CAMK2A Carbonic anhydraseXIII VEGF FGF-17 CaMKKalpha HSP90b ART CSK21

TABLE 9 Biomarkers Identified in Smoker-NSCLC in Blended Data Set SCFsRUBE2N CystatinC GSK-3alpha LRIG3 MIP-5 LYN CATC HSP90a Contactin-5MPIF-1 SBDS ERBB1 Ubiquitin+1 GCP-2 PAFAH beta subunit C9 MacrophageKPCI IMB1 mannose receptor AMPM2 PRKACA MK12 CSK21 Kallikrein 7Cathepsin S MAPKAPK3 PKB PTN BMP-1 Integrin a1b1 Dtk PARC Cyclophilin AHSP70 DUS3 CD30 Ligand CCL28 RPS6KA3 Calpain I Prothrombin EndostatinNACA TNFsR-I CSK Cathepsin H RS7 PTP-1B CK-MB Granzyme A Peroxiredoxin-1IDE BTK GAPDH, liver MMP-7 HSP90b C1s FGF-17 pTEN Fibrinogen IGFBP-2BARK1 UFM1 Caspase-3 LDH-H1 BLC UBC9 PSA-ACT RAC1 RabGDP dissociationFSTL3 OCAD1 inhibitor beta Renin CD30 BGH3 SOD CNDP1 MK13 UFC1 METAP1TCTP NAGK MK01 PSA IL-15Ra b2-Microglobulin ERK-1

TABLE 10 Biomarkers for Lung Cancer Benign Nodule Smokers AMPM2 YESSCFsR BMP-1 MK13 LRIG3 BTK LRIG3 HSP90a C1s HMG-1 ERBB1 C9 ERBB1 C9Cadherin E CadherinE AMPM2 Catalase CK-MB Kallikrein7 Cathepsin H C9 PTNCD30Ligand SCFsR PARC CK-MB CNDP1 CD30Ligand CNDP1 RGM-C ProthrombinContactin-5 METAP1 CSK CSK Macrophage CK-MB mannose receptor ERBB1 BMP-1BTK HMG-1 KPCI C1s HSP90a IGFBP-2 IGFBP-2 HSP90b CSK LDH-H1 IGFBP-2 NACARAC1 IL-15Ra IMB1 Renin IMB1 CathepsinH CNDP1 Kallikrein7 MMP-7 TCTPKPCI VEGF IL-15Ra LDH-H1 HSP90b UBE2N LRIG3 Catalase MIP-5 Macrophagemannose receptor Prothrombin Contactin-5 METAP1 ApoA-I Ubiquitin+1 MIP-5b-ECGF BLC MK13 BLC BMP-1 MMP-7 Cadherin-6 CDK5-p35 NACA Calpain ICyclophilin A PARC CATC Endostatin Prothrombin CD30Ligand FGF-17 PTNFGF-17 FYN RAC1 GAPDH GAPDH Renin HSP90a KPCI RGM-C IL-17B MEK1 SCF sRLGMN Midkine TCTP MEK1 sL-Selectin UBE2N NAGK Ubiquitin+1 Proteinase-3VEGF YES ApoA-I b-ECGF BLC Cadherin-6 Calpain I CATC CDK5-p35CyclophilinA Endostatin FYN FGF-17 GAPDH IL-17B LGMN MEK1 Midkine NAGKProteinase-3 sL-Selectin

TABLE 11 Aptamer To Solution Assay Up or Designated K_(d) LLOQ DownBiomarker (M) (M) Regulated AMPM2 3 × 10⁻¹⁰ NM Up Apo A-I 9 × 10⁻⁰⁹ 2 ×10⁻¹¹ Down β-ECGF 1 × 10⁻¹⁰ NM Up (pool) BLC 5 × 10⁻¹⁰ 7 × 10⁻¹⁴ Up(pool) BMP-1 2 × 10⁻¹⁰ 9 × 10⁻¹³ Down BTK 8 × 10⁻¹⁰ 2 × 10⁻¹³ Up (pool)C1s 8 × 10⁻⁰⁹ 7 × 10⁻¹² Up C9 1 × 10⁻¹¹ 1 × 10⁻¹⁴ Down Cadherin E 3 ×10⁻¹⁰ 2 × 10⁻¹² Down Cadherin-6 2 × 10⁻⁰⁹ 2 × 10⁻¹² Up Calpain I 2 ×10⁻¹¹ 7 × 10⁻¹⁴ Up Catalase 7 × 10⁻¹⁰ 8 × 10⁻¹⁴ Up (pool) CATC 8 × 10⁻⁰⁸NM Up Cathepsin H 1 × 10⁻⁰⁹ 8 × 10⁻¹³ Up (pool) CD30 Ligand 2 × 10⁻⁰⁹ 7× 10⁻¹³ Up (pool) CDK5/p35 2 × 10⁻¹⁰ NM Up CK-MB 1 × 10⁻⁰⁸ NM Down(pool) CNDP1 3 × 10⁻⁰⁸ NM Down Contactin-5 3 × 10⁻¹¹ NM Down CSK 3 ×10⁻¹⁰ 5 × 10⁻¹³ Up Cyclophilin 1 × 10⁻⁰⁹ 2 × 10⁻¹³ Up A (pool)Endostatin 5 × 10⁻¹⁰ 1 × 10⁻¹³ Up ERBB1 1 × 10⁻¹⁰ 4 × 10⁻¹⁴ Down FGF-175 × 10⁻¹⁰ NM Up (pool) FYN 3 × 10⁻⁰⁹ NM Up (pool) GAPDH 8 × 10⁻¹² 4 ×10⁻¹³ Up HMG-1 2 × 10⁻¹⁰ 1 × 10⁻¹² Up HSP 90α 1 × 10⁻¹⁰ 1 × 10⁻¹² UpHSP90β 2 × 10⁻¹⁰ 4 × 10⁻¹² Up IGFBP-2 6 × 10⁻¹⁰ 9 × 10⁻¹³ Up IL-15 Rα 4× 10⁻¹¹ 1 × 10⁻¹³ Up (pool) IL-17B 3 × 10⁻¹¹ 4 × 10⁻¹³ Up (pool) IMB1 8× 10⁻⁰⁸ NM Up (pool) Kallikrein 7 6 × 10⁻¹¹ 2 × 10⁻¹² Down KPCI 9 ×10⁻⁰⁹ NM Up LDH-H1 1 × 10⁻⁰⁹ 8 × 10⁻¹³ Up LGMN 7 × 10⁻⁰⁹ NM Up LRIG3 3 ×10⁻¹¹ 8 × 10⁻¹⁴ Down Macrophage 1 × 10⁻⁰⁹ 1 × 10⁻¹¹ Up mannose receptorMEK1 6 × 10⁻¹⁰ NM Up METAP1 7 × 10⁻¹¹ 9 × 10⁻¹³ Up Midkine 2 × 10⁻¹⁰ 4 ×10⁻¹¹ Up MIP-5 9 × 10⁻⁰⁹ 2 × 10⁻¹³ Up (pool) MK13 2 × 10⁻⁰⁹ NM Up MMP-77 × 10⁻¹¹ 3 × 10⁻¹³ Up NACA 2 × 10⁻¹¹ NM Up NAGK 2 × 10⁻⁰⁹ NM Up (pool)PARC 9 × 10⁻¹¹ 1 × 10⁻¹³ Up Proteinase-3 5 × 10⁻⁰⁹ 4 × 10⁻¹² Up (pool)Prothrombin 5 × 10⁻⁰⁹ 1 × 10⁻¹² Down PTN 4 × 10⁻¹¹ 5 × 10⁻¹² Up RAC1 7 ×10⁻¹¹ NM Up Renin 3 × 10⁻¹¹ 3 × 10⁻¹³ Up RGM-C 3 × 10⁻¹¹ NM Down SCF sR5 × 10⁻¹¹ 3 × 10⁻¹² Down sL-Selectin 2 × 10⁻¹⁰ 2 × 10⁻¹³ Down (pool)TCTP 2 × 10⁻¹¹ NM Up (pool) UBE2N 6 × 10−11 NM Up (pool) Ubiquitin+1 2 ×10⁻¹⁰ 1 × 10⁻¹² Up VEGF 4 × 10⁻¹⁰ 9 × 10⁻¹⁴ Up YES 2 × 10⁻⁰⁹ NM Up

TABLE 12 Parameters for Smoker Control Group Biomarker # from Table 1Biomarker μ_(c) σ_(c) ² μ_(d) σ_(d) ² KS p-value AUC 1 AMPM2 3.051.07E−02 3.20 3.62E−02 0.45 5.55E−24 0.75 4 BLC 2.58 1.23E−02 2.723.97E−02 0.37 8.72E−17 0.74 5 BMP-1 4.13 1.32E−02 4.00 2.01E−02 0.381.21E−17 0.75 6 BTK 3.12 2.44E−01 3.51 2.45E−01 0.35 3.25E−15 0.72 7 C1s4.01 3.47E−03 4.06 4.23E−03 0.31 4.68E−12 0.69 8 C9 5.31 3.54E−03 5.385.37E−03 0.43 3.49E−22 0.75 15 CD30 Ligand 3.21 2.86E−03 3.26 4.42E−030.31 1.08E−11 0.70 16 CDK5-p35 2.98 3.48E−03 3.02 4.75E−03 0.25 1.63E−070.67 17 CK-MB 3.25 5.18E−02 3.07 4.89E−02 0.33 1.42E−13 0.71 18 CNDP13.65 1.97E−02 3.52 3.07E−02 0.36 4.14E−16 0.73 19 Contactin-5 3.669.35E−03 3.59 1.33E−02 0.31 1.67E−11 0.68 20 CSK 3.25 6.59E−02 3.541.10E−01 0.41 1.33E−20 0.76 21 Cyclophilin A 4.42 6.04E−02 4.65 6.80E−020.38 2.17E−17 0.73 22 Endostatin 4.61 4.29E−03 4.67 1.07E−02 0.321.42E−12 0.69 23 ERBB1 4.17 2.25E−03 4.10 5.18E−03 0.47 9.39E−27 0.78 24FGF-17 3.08 1.12E−03 3.11 1.31E−03 0.32 1.07E−12 0.71 25 FYN 3.186.88E−02 3.24 7.99E−02 0.13 1.53E−02 0.58 26 GAPDH 3.26 7.32E−02 3.511.62E−01 0.40 2.02E−19 0.68 28 HSP90a 4.45 1.86E−02 4.61 1.86E−02 0.503.09E−30 0.80 30 IGFBP-2 4.30 3.42E−02 4.48 4.17E−02 0.37 5.40E−17 0.7431 IL-15 Ra 3.03 9.74E−03 3.12 2.10E−02 0.31 7.31E−12 0.69 34 Kallikrein7 3.52 8.67E−03 3.44 1.21E−02 0.36 2.47E−15 0.70 35 KPCI 2.58 2.92E−032.66 1.01E−02 0.40 2.30E−19 0.74 36 LDH-H1 3.60 8.03E−03 3.67 1.45E−020.32 3.70E−12 0.68 38 LRIG3 3.55 3.10E−03 3.50 3.60E−03 0.36 1.39E−150.72 40 MEK1 2.81 1.54E−03 2.84 2.75E−03 0.28 1.96E−09 0.67 42 Midkine3.21 3.13E−02 3.24 5.58E−02 0.13 1.90E−02 0.56 43 MIP-5 3.60 3.65E−023.77 5.88E−02 0.34 8.40E−14 0.70 48 PARC 4.90 1.94E−02 5.01 2.13E−020.34 7.01E−14 0.71 50 Prothrombin 4.68 5.37E−02 4.53 4.31E−02 0.321.09E−12 0.68 51 PTN 3.73 7.08E−03 3.80 7.36E−03 0.34 3.97E−14 0.72 52RAC1 3.85 6.13E−02 4.09 7.31E−02 0.40 4.60E−19 0.72 53 Renin 3.252.52E−02 3.39 6.36E−02 0.30 4.23E−11 0.68 55 SCF sR 3.79 1.11E−02 3.681.48E−02 0.37 9.90E−17 0.75 56 sL-Selectin 4.46 5.63E−03 4.40 9.30E−030.30 6.24E−11 0.69 57 TCTP 4.19 4.69E−02 4.44 7.43E−02 0.43 9.69E−220.76 58 UBE2N 4.42 9.30E−02 4.67 9.53E−02 0.34 6.56E−14 0.72 59Ubiquitin+1 4.25 1.75E−02 4.34 1.43E−02 0.31 1.55E−11 0.68

TABLE 13 Parameters for benign nodules control group Biomarker # fromTable 1 Biomarker μ_(c) σ_(c) ² μ_(d) σ_(d) ² KS p-value AUC 2 ApoA-I3.83 1.04E−02 3.77 1.56E−02 0.24 1.67E−07 0.65 3 b-ECGF 3.03 1.27E−033.06 1.53E−03 0.30 7.50E−12 0.68 4 BLC 2.60 1.50E−02 2.72 3.97E−02 0.311.77E−12 0.70 5 BMP-1 4.11 1.39E−02 4.00 2.01E−02 0.32 2.00E−13 0.72 8C9 5.31 4.84E−03 5.38 5.37E−03 0.39 9.42E−20 0.75 9 Cadherin E 4.515.91E−03 4.43 9.86E−03 0.37 1.93E−17 0.74 10 Cadherin-6 2.91 3.79E−032.98 1.12E−02 0.36 1.42E−16 0.72 11 Calpain I 4.37 1.33E−02 4.502.32E−02 0.40 7.63E−21 0.75 12 Catalase 4.27 2.09E−02 4.37 1.30E−02 0.344.30E−15 0.72 13 CATC 2.80 5.83E−03 2.86 7.63E−03 0.31 8.55E−13 0.69 14Cathepsin H 4.59 3.24E−03 4.63 7.54E−03 0.30 4.29E−12 0.66 15 CD30Ligand 3.21 4.19E−03 3.26 4.42E−03 0.26 4.70E−09 0.68 17 CK-MB 3.234.47E−02 3.07 4.89E−02 0.32 2.76E−13 0.70 18 CNDP1 3.65 2.03E−02 3.523.07E−02 0.35 2.04E−15 0.72 20 CSK 3.25 7.98E−02 3.54 1.10E−01 0.412.35E−21 0.76 23 ERBB1 4.17 2.76E−03 4.10 5.18E−03 0.46 1.22E−26 0.77 24FGF-17 3.08 1.26E−03 3.11 1.31E−03 0.31 9.59E−13 0.71 26 GAPDH 3.227.96E−02 3.51 1.62E−01 0.40 7.88E−21 0.69 27 HMG-1 4.01 4.57E−02 4.197.55E−02 0.30 1.99E−11 0.70 28 HSP90a 4.43 2.23E−02 4.61 1.86E−02 0.511.26E−33 0.81 29 HSP90b 3.06 3.70E−03 3.14 9.67E−03 0.42 2.73E−22 0.7530 IGFBP-2 4.32 3.57E−02 4.48 4.17E−02 0.35 2.30E−15 0.73 32 IL-17B 2.193.73E−03 2.23 4.16E−03 0.28 3.65E−10 0.68 33 IMB1 3.47 2.21E−02 3.675.45E−02 0.42 2.04E−22 0.75 35 KPCI 2.57 3.26E−03 2.66 1.01E−02 0.433.57E−23 0.75 37 LGMN 3.13 2.03E−03 3.17 4.15E−03 0.30 1.15E−11 0.69 38LRIG3 3.55 3.59E−03 3.50 3.60E−03 0.33 9.00E−14 0.71 39 Macrophage 4.101.51E−02 4.22 2.48E−02 0.36 7.24E−17 0.72 mannose receptor 40 MEK1 2.811.77E−03 2.84 2.75E−03 0.31 3.79E−12 0.69 41 METAP1 2.67 2.45E−02 2.895.83E−02 0.44 2.99E−24 0.75 44 MK13 2.79 3.38E−03 2.85 4.88E−03 0.366.16E−17 0.74 45 MMP-7 3.64 3.24E−02 3.82 4.85E−02 0.37 1.89E−17 0.73 46NACA 3.11 8.28E−03 3.21 2.63E−02 0.34 4.91E−15 0.70 47 NAGK 3.712.04E−02 3.84 2.63E−02 0.38 7.50E−19 0.73 49 Proteinase-3 3.95 9.09E−024.18 1.23E−01 0.30 2.22E−11 0.69 50 Prothrombin 4.67 4.19E−02 4.534.31E−02 0.32 2.17E−13 0.68 54 RGM-C 4.44 4.85E−03 4.38 6.13E−03 0.301.00E−11 0.69 55 SCF sR 3.77 9.71E−03 3.68 1.48E−02 0.35 1.96E−15 0.7260 VEGF 3.55 8.80E−03 3.62 1.14E−02 0.30 1.27E−11 0.69 61 YES 2.979.54E−04 3.00 1.73E−03 0.29 7.59E−11 0.67

TABLE 14 Sensitivity + Specificity for Exemplary Combinations ofBiomarkers Sensitivity + # Sensitivity Specificity Specificity AUC 1 SCF0.629 0.727 1.356 0.75 sR 2 SCF HSP90a 0.761 0.753 1.514 0.84 sR 3 SCFHSP90a ERBB1 0.775 0.827 1.602 0.87 sR 4 SCF HSP90a ERBB1 PTN 0.7840.861 1.645 0.89 sR 5 SCF HSP90a ERBB1 PTN BTK 0.84 0.844 1.684 0.9 sR 6SCF HSP90a ERBB1 PTN BTK CD30 0.822 0.869 1.691 0.9 sR Ligand 7 SCFHSP90a ERBB1 PTN BTK CD30 Kallikrein 0.845 0.875 1.72 0.91 sR Ligand 7 8SCF HSP90a ERBB1 PTN BTK CD30 Kallikrein LRIG3 0.859 0.864 1.723 0.91 sRLigand 7 9 SCF HSP90a ERBB1 PTN BTK CD30 Kallikrein LRIG3 LDH- 0.8690.872 1.741 0.91 sR Ligand 7 H1 10 SCF HSP90a ERBB1 PTN BTK CD30Kallikrein LRIG3 LDH- PARC 0.873 0.878 1.751 0.91 sR Ligand 7 H1

TABLE 15 Parameters derived from training set for naïve Bayesclassifier. Biomarker μ_(c) μ_(d) σ_(c) σ_(d) {tilde over (x)} p(c|{tilde over (x)}) p(d| {tilde over (x)}) ln(p(c| {tilde over (x)})/p(d|{tilde over (x)})) C9 11.713 11.934 0.199 0.210 11.667 1.946 0.843 0.836LRIG3 7.409 7.307 0.090 0.084 7.372 4.058 3.511 0.145 GAPDH 9.027 9.3850.511 0.230 9.000 0.780 0.428 0.599 MMP12 6.139 6.346 0.096 0.255 6.1294.115 1.087 1.332 KLK7 8.130 7.979 0.230 0.298 8.419 0.789 0.450 0.562

TABLE 16 Naïve Bayes parameters for all markers in Table 21 for bothtissue and serum Tissue Tissue Tissue Tissue Serum Serum Serum SerumBiomarker μ_(c) μ_(d) σ_(c) σ_(d) u_(c) μ_(d) σ_(c) σ_(d) Activin A5.927 6.713 0.124 0.816 6.990 7.060 0.089 0.120 Adiponectin 9.357 8.5600.456 0.154 8.986 9.141 0.406 0.391 AMPM2 9.352 9.916 0.393 0.313 7.0677.079 0.091 0.107 Apo A-I 6.554 6.573 0.312 0.271 8.699 8.593 0.1390.130 b-ECGF 6.731 7.256 0.606 0.739 6.205 6.160 0.056 0.072 BGN 7.9896.821 0.653 0.168 7.140 7.067 0.125 0.077 BLC 6.283 7.776 0.253 0.9717.065 7.058 0.066 0.072 BMP-1 5.149 5.377 0.071 0.239 8.766 8.548 0.2130.234 BTK 8.782 7.757 0.547 1.303 7.567 7.856 0.464 0.304 C1s 7.9897.973 0.206 0.298 8.532 8.540 0.106 0.121 C9 11.488 11.417 0.325 0.38011.715 11.936 0.189 0.223 Cadherin-6 7.129 7.502 0.034 0.265 7.971 7.9590.087 0.067 Cadherin E 7.370 7.916 0.458 0.349 9.252 9.050 0.200 0.181Calpain 1 9.641 9.962 0.503 0.553 10.358 10.466 0.132 0.143 Carbonicanhydrase III 9.096 7.504 0.288 0.890 8.552 8.687 0.474 0.351 Caspase-36.758 7.426 0.248 0.110 7.097 7.136 0.338 0.367 Catalase 11.253 10.5810.127 0.633 10.051 10.243 0.392 0.276 CATC 7.839 7.783 0.457 0.385 7.2487.229 0.088 0.088 Cathepsin H 12.452 11.758 0.158 0.410 9.485 9.5850.124 0.210 CD30 Ligand 6.655 6.911 0.069 0.168 7.622 7.605 0.038 0.035CD36 ANTIGEN 7.026 6.262 0.478 0.175 8.252 8.224 0.114 0.141 CDK5/p356.581 6.741 0.210 0.215 6.986 7.044 0.083 0.075 CK-MB 8.912 8.564 0.7030.611 7.515 7.230 0.317 0.307 CNDP1 7.293 7.292 0.062 0.189 9.995 9.7540.295 0.375 Contactin-5 5.793 5.777 0.063 0.146 6.749 6.689 0.109 0.141CSK 8.526 8.181 0.370 0.736 6.809 7.186 0.388 0.245 CXCL16, soluble8.216 7.559 0.517 0.449 9.660 9.744 0.185 0.230 Cyclophilin A 11.75111.668 0.159 0.123 8.586 8.784 0.323 0.233 Endostatin 8.669 8.096 0.2330.374 8.763 8.876 0.125 0.162 ERBB1 7.041 7.263 0.134 0.336 10.57810.428 0.119 0.135 ESAM 8.659 7.451 0.376 0.473 9.022 9.033 0.151 0.142FGF-17 6.111 5.998 0.036 0.066 6.897 6.902 0.062 0.069 Fibronectin 9.68110.795 0.452 1.097 11.288 11.105 0.253 0.269 FYN 8.003 7.834 0.149 0.2628.002 8.033 0.123 0.086 GAPDH, liver 12.703 12.713 0.123 0.152 9.0339.410 0.536 0.194 HMG-1 11.639 11.541 0.545 0.615 8.430 8.546 0.1330.096 HSP 90a 11.569 11.820 0.479 0.279 9.165 9.343 0.226 0.182 HSP 90b8.509 9.422 0.974 0.960 7.635 7.653 0.053 0.059 IDE 8.426 9.023 0.3620.302 7.670 7.728 0.096 0.106 IGFBP-2 7.715 9.591 0.416 1.413 8.5149.006 0.417 0.448 IGFBP-5 7.619 9.347 0.282 1.263 9.705 9.675 0.1260.138 IGFBP-7 8.999 9.843 0.717 0.307 9.251 9.156 0.148 0.172 IL-15 Ra6.088 6.577 0.123 0.318 7.068 7.066 0.056 0.071 IL-17B 5.441 5.531 0.0510.139 6.267 6.257 0.052 0.066 IL-8 7.037 8.206 0.145 0.631 7.114 7.1090.052 0.066 IMB1 5.867 6.218 1.300 1.010 7.326 7.390 0.150 0.152Kallikrein 7 5.990 6.152 0.146 0.447 8.132 7.964 0.221 0.295 KPCI 6.5896.821 0.244 0.420 6.195 6.194 0.053 0.046 LDH-H 1 12.527 12.640 0.1350.169 7.221 7.261 0.140 0.198 LGMN 7.964 8.124 0.084 0.101 8.404 8.3770.074 0.070 LRIG3 6.198 6.213 0.383 0.336 7.411 7.301 0.090 0.092Macrophage 6.738 5.654 0.394 0.440 8.132 8.233 0.203 0.253 mannosereceptor MEK1 6.543 6.657 0.305 0.505 5.979 5.966 0.039 0.048 METAP19.004 9.807 0.540 0.412 7.955 7.982 0.095 0.081 Midkine 6.619 7.2230.770 1.112 7.714 7.714 0.298 0.193 MIP-5 5.582 5.657 0.041 0.090 8.5608.659 0.262 0.233 MK13 7.195 7.793 0.260 0.491 NA NA NA NA MMP-12 5.8228.677 0.182 1.045 6.129 6.323 0.100 0.260 MMP-7 6.800 8.224 0.440 0.2158.881 9.232 0.235 0.182 NACA 6.480 6.738 0.207 0.183 7.774 7.791 0.1110.108 NAGK 9.469 9.986 0.328 0.457 7.385 7.476 0.203 0.216 NAP-2 10.6729.447 0.357 0.842 7.765 7.775 0.286 0.342 PARC 9.519 9.315 0.537 0.16910.087 10.291 0.424 0.369 Proteinase-3 7.667 6.963 0.789 0.850 8.3408.394 0.461 0.504 Prothrombin 7.245 7.400 0.443 0.390 NA NA NA NAP-Selectin 7.947 6.593 0.263 0.508 9.937 9.944 0.278 0.199 PTN 7.3637.301 0.492 0.531 8.149 8.250 0.116 0.152 RAC1 11.522 11.299 0.109 0.2208.408 8.697 0.378 0.323 Renin 5.964 5.894 0.039 0.080 7.675 7.797 0.3380.506 RGM-C 6.677 6.646 0.049 0.084 9.765 9.700 0.164 0.180 SCF sR 6.6076.639 0.163 0.175 9.603 9.503 0.139 0.141 SLPI 10.635 9.435 0.676 0.476NA NA NA NA sL-Selectin 6.524 6.827 0.149 0.166 NA NA NA NA sRAGE 11.1547.304 0.619 0.912 7.001 6.845 0.333 0.297 TCTP 10.524 10.395 0.087 0.1278.847 9.137 0.290 0.224 Thrombospondin-1 9.012 10.305 0.520 1.093 9.1878.950 0.558 0.349 TPSB2 10.798 9.138 0.668 1.055 7.714 7.435 0.346 0.441TrATPase 11.031 8.887 0.993 0.703 9.099 9.168 0.204 0.148 TSP2 6.5697.837 0.085 0.627 7.468 7.562 0.162 0.218 UBE2N 10.654 10.725 0.1660.140 9.234 9.487 0.521 0.288 Ubiquitin+1 10.948 10.860 0.249 0.2759.218 9.284 0.249 0.171 uPA 5.747 6.564 0.119 0.445 6.868 6.874 0.1040.126 URB 7.180 8.539 0.283 0.699 8.689 8.756 0.173 0.202 VEGF 6.3137.593 0.088 1.074 7.699 7.769 0.096 0.145 vWF 7.927 7.193 0.263 0.13910.531 10.684 0.236 0.200 YES 7.086 7.723 0.386 0.314 6.593 6.605 0.0650.067

TABLE 17 Patient demographics, resection location and tumor types forthe eight NSCLC samples Smoking Age Sex History Location Stage Tissue Dx47 F Smoker Left Upper Lobe pT3pN1pMx stage IIIA Poorly differentiatednon-small cell CA with focal Squamous differentiation 73 F Smoker LeftLower Lobe pT2pN0pMx stage IB Poorly differentiated Squamous cellcarcinoma 48 M Smoker Right Upper Lobe pT2pN1pMx stage IIIA Poorlydifferentiated Squamous cell carcinoma 60 F Smoker Left Upper Lobe T4 N1M0 stage IIIB - Poorly differentiated Squamous cell note T4 distinctionbased carcinoma on clinical lung collapse; tumor was pT2 by sizecriteria 51 F Smoker Right Upper Lobe pT2pN0pMx stage IB Moderatelydifferentiated Adenocarcinoma 71 F Smoker Right Upper Lobe pT2pN0pMxstage IB Well differentiated Adenocarcinoma 75 F Smoker Right Lower LobepT1N0Mx Stage IA Well differentiated Adenocarcinoma 73 M Smoker LeftUpper Lobe pT1bN0Mx Stage IA Atypical Carcinoid Tumor (i.e.neuroendocrine, IHC positive for chromogranin)

TABLE 18 Differentially Expressed Biomarkers Between Tumor and NormalTissue Biomarker Up/Down # Biomarker Designation Alternate Protein NamesGene Regulated 1 Activin A Inhibin beta A chain INHBA up Activin beta-Achain Erythroid differentiation protein EDF 2 Adiponectin 30 kDaadipocyte complement-related protein ADIPOQ down Adipocytecomplement-related 30 kDa protein ACRP30 Adipocyte, C1q and collagendomain-containing protein Adipose most abundant gene transcript 1protein apM-1 Gelatin-binding protein Adipolean 3 BCA-1* C-X-C motifchemokine 13 CXCL13 up Angie B cell-attracting chemokine 1 B lymphocytechemoattractant CXC chemokine BLC Small-inducible cytokine B13 BLC 4Biglycan Bone/cartilage proteoglycan I BGN down PG-S1 5 Cadherin-1* CAM120/80 CDH1 up Epithelial cadherin E-cadherin Uvomorulin CD324 6Carbonic anhydrase III Carbonic anhydrase 3 CA3 down Carbonatedehydratase III CA-III 7 Caspase-3 CASP-3 CASP3 up Apopain Cysteineprotease CPP32 CPP-32 Protein Yama SREBP cleavage activity 1 SCA-1 8Catalase* CAT down 9 CD36 Antigen Platelet glycoprotein 4 CD36 downFatty acid translocase FAT Glycoprotein IIIb GPIIIB Leukocytedifferentiation antigen CD36 PAS IV PAS-4 Platelet collagen receptorPlatelet glycoprotein IV GPIV Thrombospondin receptor 10 CXCL16, solubleC-X-C motif chemokine 16 CXCL16 down Scavenger receptor forphosphatidylserine and oxidized low density lipoprotein SR-PSOXSmall-inducible cytokine B16 Transmembrane chemokine CSCL16 11Endostatin* COL18A1 down 12 ESAM Endothelial cell-selective adhesionmolecule ESAM down 13 Fibronectin FN FN1 up Cold-insoluble globulin CIGFNT 14 Insulysin Insulin-degrading enzyme IDE up Abeta-degradingprotease Insulin protease Insulinase 15 IGFBP-2* Insulin-like growthfactor-binding protein 2 IGFBP2 up IBP-2 IGF-binding protein 2 16IGFBP-5 Insulin-like growth factor-binding protein 5 IGFBP5 up IBP-5IGF-binding protein 5 17 IGFBP-7 Insulin-like growth factor-bindingprotein 7 IGFBP7 up IBP-7 IGF-binding protein 7 IGFBP-rP1 MAC25 proteinPGI2-stimulating factor Prostacyclin-stimulating factor Tumor-derivedadhesion factor TAF 18 IL-8 Interleukin-8 IL8 up C-X-C motif chemokine 8Emoctakin Granulocyte chemotactic protein 1 GCP-1 Monocyte-derivedneutrophil chemotactic factor MDNCF Monocyte-derivedneutrophil-activating peptide MONAP Neutrophil-activating protein NAP-1Protein 3-10C T-cell chemotactic factor 19 MRC1* Macrophage mannosereceptor 1 MRC1 down MMR C-type lectin domain family 13 member D C-typelectin domain family 13 member D-like Macrophage mannose receptor 1-likeprotein 1 CD206 20 MAPK13* Mitogen-activated protein kinase 13 MAPK13 upMAP kinase 13 Mitogen-activated protein kinase p38 delta MAP kinase p38delta Stress-activated protein kinase 4 21 MMP-7* Matrilysin MMP7 upMatrin Matrix metalloproteinase-7 Pump-1 protease Uterinemetalloproteinase 22 MMP-12* Macrophage metalloelastase MMP12 up MMEMacrophage elastase ME hME Matrix metalloproteinase-12 23 NAGK*N-acetyl-D-glucosamine kinase NAGK up N-acetylglucosamine kinase GlcNAckinase 24 NAP-2 Neutrophil-activating peptide 2 PPBP down 25 P-SelectinCD62 antigen-like family member P SELP down Granule membrane protein 140GMP-140 Leukocyte-endothelial cell adhesion molecule 3 LECAM-3 Plateletactivation dependent granule-external membrane protein PADGEM CD62P 26SLPI Antileukoproteinase SLPI down ALP BLPI HUSI-1 Mucus proteinaseinhibitor MPI Protease inhibitor WAP4 Secretory leukocyte proteaseinhibitor Seminal proteinase inhibitor WAP four-disulfide core domainprotein 4 27 sRAGE Advanced glycosylation end product-specific AGER downreceptor Receptor for advanced glycosylation end products 28Thrombospondin-1 TSP-1 THBS1 up 29 Thrombospondin-2 TSP-2 THBS1 up 30TrATPase Tartrate-resistant acid phosphatase type 5 ACP5 down TR-APTartrate-resistant acid ATPase Type 5 acid phosphatase 31 Tryptase β-2Tryptase beta-2 TPSB2 down Tryptase-2 Tryptase II TRYB2 32 uPAUrokinase-type plasminogen activator PLAU up U-plasminogen activatorUrokinase 33 URB Coiled-coil domain-containing protein 80 CCDC80 upDown-regulated by oncogenes protein 1 Up-regulated in BRS-3 deficientmouse homolog 34 VEGF* Vascular endothelial growth factor A VEGFA upVEGF-A Vascular permeability factor VPF 35 vWF von Willebrand factor VWFdown 36 YES* Tyrosine-protein kinase Yes YES1 up Proto-oncogene c-YesP61-Yes *Overlap of Biomarkers Expressed in both Serum and Tumor Tissue

TABLE 19 Categorization of NSCLC tissue biomarkers into biologicalprocesses Inflamma- Invasion, Growth and tion & Metastasis AngiogenesisMetabolism Apoptosis (ECM) VEGF Adiponectin* Activin A Biglycan*Endostatin Carbonic BCA-1* Cadherin-1 anhydrase III* Thrombospondin-1IGFBP-2 Catalase CD36 Antigen Thrombospondin-2 IGFBP-5 CXCL16, ESAMsoluble* IGFBP-7 IL-8 Fibronectin* Insulysin* MRC1* MMP-7 NAGK* NAP-2MMP-12 TrATPase* sRAGE P-Selectin* Tryptase b-2 SLPI URB* MAPK13* uPAvWF Caspase-3 Thrombospondin-1 Thrombospondin-2 YES *Novel NSCLCBiomarker

TABLE 20 Biomarkers Identified in NSCLC Tissue* Biomarker # BiomarkerDesignation 1 Activin A 2 Adiponectin 3 Biglycan 4 Carbonic anhydraseIII 5 Caspase-3 6 CD36 Antigen 7 CXCL16, soluble 8 ESAM 9 Fibronectin 10Insulysin 11 IGFBP-5 12 IGFBP-7 13 IL-8 14 MMP-12 15 NAP-2 16 P-Selectin17 SLPI 18 sRAGE 19 Thrombospondin-1 20 Thrombospondin-2 21 TrATPase 22Tryptase β-2 23 uPA 24 URB 25 vWF *This list excludes biomarkers whichwere identified in both tissue and serum samples

TABLE 21 Biomarkers Identified in Serum and Tissue Biomarker BiomarkerDesignation 1 Activin A 2 Adiponectin 3 AMPM2 4 Apo A-I 5 Biglycan 6b-ECGF 7 BLC* 8 BMP-1 9 BTK 10 C1s 11 C9 12 Cadherin E* 13 Cadherin-6 14Calpain I 15 Carbonic anhydrase III 16 Caspase-3 17 Catalase* 18 CATC 19Cathepsin H 20 CD30 Ligand 21 CD36 Antigen 22 CDK5-p35 23 CK-MB 24 CNDP125 Contactin-5 26 CSK 27 CXCL16, soluble 28 Cyclophilin A 29 Endostatin*30 ERBB1 31 ESAM 32 FGF-17 33 Fibronectin 34 FYN 35 GAPDH, liver 36HMG-1 37 HSP 90a 38 HSP 90b 39 IGFBP-2* 40 IGFBP-5 41 IGFBP-7 42 IL-8 43IL-15 Ra 44 IL-17B 45 IMB1 46 Insulysin 47 Kallikrein 7 48 KPCI 49 LDH-H1 50 LGMN 51 LRIG3 52 Macrophage mannose receptor* 53 MEK1 54 METAP1 55Midkine 56 MIP-5 57 MK13* 58 MMP-7* 59 MMP-12* 60 NACA 61 NAGK* 62 NAP-263 PARC 64 P-Selectin 65 Proteinase-3 66 Prothrombin 67 PTN 68 RAC1 69Renin 70 RGM-C 71 SCF sR 72 SLPI 73 sL-Selectin 74 sRAGE 75 TCTP 76Thrombospondin-1 77 Thrombospondin-2 78 TrATPase 79 Tryptase β-2 80UBE2N 81 Ubiquitin+1 82 uPA 83 URB 84 VEGF* 85 vWF 86 YES* *Biomarkersidentified in both serum in tissue

TABLE 22 81 Panels of two biomarkers including MMP-12 Markers AUC 1 CSKMMP-12 0.848 2 GAPDH, liver MMP-12 0.842 3 Cyclophilin A MMP-12 0.832 4TCTP MMP-12 0.831 5 C9 MMP-12 0.828 6 LRIG3 MMP-12 0.826 7 MMP-7 MMP-120.824 8 BMP-1 MMP-12 0.823 9 SCF sR MMP-12 0.823 10 ERBB1 MMP-12 0.82211 RAC1 MMP-12 0.822 12 Kallikrein 7 MMP-12 0.822 13 HSP 90a MMP-120.817 14 CDK5/p35 MMP-12 0.815 15 IGFBP-2 MMP-12 0.812 16 HMG-1 MMP-120.809 17 Cadherin E MMP-12 0.808 18 b-ECGF MMP-12 0.807 19 Calpain IMMP-12 0.805 20 RGM-C MMP-12 0.804 21 IMB1 MMP-12 0.802 22 UBE2N MMP-120.802 23 LGMN MMP-12 0.801 24 Catalase MMP-12 0.801 25 CK-MB MMP-120.800 26 BTK MMP-12 0.799 27 Endostatin MMP-12 0.791 28 BGN MMP-12 0.79129 PTN MMP-12 0.790 30 CD30 Ligand MMP-12 0.789 31 Activin A MMP-120.785 32 vWF MMP-12 0.784 33 TSP2 MMP-12 0.784 34 IL-8 MMP-12 0.782 35Adiponectin MMP-12 0.781 36 Thrombospondin-1 MMP-12 0.779 37 NAGK MMP-120.777 38 MIP-5 MMP-12 0.776 39 VEGF MMP-12 0.776 40 NACA MMP-12 0.773 41LDH-H 1 MMP-12 0.771 42 CNDP1 MMP-12 0.770 43 IGFBP-7 MMP-12 0.770 44Proteinase-3 MMP-12 0.769 45 TPSB2 MMP-12 0.769 46 Apo A-I MMP-12 0.76847 Macrophage mannose receptor MMP-12 0.768 48 Ubiquitin+1 MMP-12 0.76749 IDE MMP-12 0.767 50 Cathepsin H MMP-12 0.766 51 CXCL16, solubleMMP-12 0.763 52 TrATPase MMP-12 0.762 53 Caspase-3 MMP-12 0.757 54Cadherin-6 MMP-12 0.757 55 Contactin-5 MMP-12 0.756 56 BLC MMP-12 0.75657 FGF-17 MMP-12 0.755 58 Fibronectin MMP-12 0.754 59 NAP-2 MMP-12 0.75460 HSP 90b MMP-12 0.754 61 C1s MMP-12 0.753 62 AMPM2 MMP-12 0.752 63IL-17B MMP-12 0.752 64 IL-15 Ra MMP-12 0.751 65 uPA MMP-12 0.750 66 PARCMMP-12 0.749 67 IGFBP-5 MMP-12 0.748 68 Renin MMP-12 0.745 69 KPCIMMP-12 0.742 70 METAP1 MMP-12 0.742 71 Carbonic anhydrase III MMP-120.740 72 CATC MMP-12 0.740 73 MEK1 MMP-12 0.740 74 URB MMP-12 0.736 75CD36 ANTIGEN MMP-12 0.735 76 Midkine MMP-12 0.735 77 sRAGE MMP-12 0.73178 ESAM MMP-12 0.729 79 YES MMP-12 0.728 80 P-Selectin MMP-12 0.723 81FYN MMP-12 0.707

TABLE 23 100 Panels of three biomarkers including MMP-12 Markers AUC 1C9 GAPDH, liver MMP-12 0.879 2 MMP-7 GAPDH, liver MMP-12 0.876 3 C9 CSKMMP-12 0.875 4 BMP-1 CSK MMP-12 0.869 5 BMP-1 GAPDH, liver MMP-12 0.8686 LRIG3 GAPDH, liver MMP-12 0.867 7 Kallikrein 7 GAPDH, liver MMP-120.867 8 MMP-7 CSK MMP-12 0.867 9 RAC1 C9 MMP-12 0.865 10 CSK Kallikrein7 MMP-12 0.865 11 IGFBP-2 GAPDH, liver MMP-12 0.864 12 CDK5/p35 GAPDH,liver MMP-12 0.862 13 C9 TCTP MMP-12 0.862 14 C9 Cyclophilin A MMP-120.862 15 LRIG3 CSK MMP-12 0.862 16 SCF sR CSK MMP-12 0.861 17 SCF sRGAPDH, liver MMP-12 0.861 18 IGFBP-2 CSK MMP-12 0.860 19 MMP-7 TCTPMMP-12 0.860 20 b-ECGF GAPDH, liver MMP-12 0.860 21 ERBB1 CSK MMP-120.859 22 RAC1 BMP-1 MMP-12 0.858 23 LRIG3 TCTP MMP-12 0.858 24 ERBB1GAPDH, liver MMP-12 0.857 25 BMP-1 TCTP MMP-12 0.857 26 RGM-C CSK MMP-120.857 27 CSK b-ECGF MMP-12 0.856 28 RAC1 Kallikrein 7 MMP-12 0.856 29CDK5/p35 CSK MMP-12 0.856 30 HMG-1 MMP-7 MMP-12 0.856 31 CK-MB GAPDH,liver MMP-12 0.855 32 RAC1 CDK5/p35 MMP-12 0.855 33 CSK Thrombospondin-1MMP-12 0.854 34 C9 BTK MMP-12 0.854 35 RAC1 LRIG3 MMP-12 0.854 36 HSP90a C9 MMP-12 0.854 37 Activin A GAPDH, liver MMP-12 0.854 38 HSP 90aBMP-1 MMP-12 0.854 39 Endostatin CSK MMP-12 0.853 40 CSK GAPDH, liverMMP-12 0.853 41 BMP-1 Cyclophilin A MMP-12 0.853 42 ERBB1 TCTP MMP-120.853 43 GAPDH, liver TCTP MMP-12 0.853 44 LGMN GAPDH, liver MMP-120.853 45 HSP 90a LRIG3 MMP-12 0.853 46 C9 Kallikrein 7 MMP-12 0.852 47SCF sR LRIG3 MMP-12 0.852 48 Calpain I C9 MMP-12 0.852 49 C9 CatalaseMMP-12 0.852 50 HMG-1 C9 MMP-12 0.852 51 C9 LRIG3 MMP-12 0.852 52 LRIG3Kallikrein 7 MMP-12 0.852 53 SCF sR TCTP MMP-12 0.851 54 SCF sRCyclophilin A MMP-12 0.851 55 BMP-1 UBE2N MMP-12 0.851 56 Kallikrein 7TCTP MMP-12 0.851 57 MMP-7 UBE2N MMP-12 0.851 58 MMP-7 RAC1 MMP-12 0.85159 Kallikrein 7 Cyclophilin A MMP-12 0.850 60 MIP-5 GAPDH, liver MMP-120.850 61 CSK CK-MB MMP-12 0.850 62 MMP-7 Cyclophilin A MMP-12 0.850 63CSK LGMN MMP-12 0.850 64 RGM-C GAPDH, liver MMP-12 0.850 65 CDK5/p35TCTP MMP-12 0.850 66 PTN GAPDH, liver MMP-12 0.850 67 Adiponectin GAPDH,liver MMP-12 0.850 68 LRIG3 UBE2N MMP-12 0.849 69 Thrombospondin-1GAPDH, liver MMP-12 0.849 70 SCF sR C9 MMP-12 0.849 71 CSK CatalaseMMP-12 0.849 72 Endostatin GAPDH, liver MMP-12 0.849 73 SCF sR RAC1MMP-12 0.849 74 RAC1 b-ECGF MMP-12 0.849 75 TPSB2 GAPDH, liver MMP-120.849 76 C9 UBE2N MMP-12 0.849 77 b-ECGF TCTP MMP-12 0.849 78 C9 IMB1MMP-12 0.849 79 Calpain I GAPDH, liver MMP-12 0.848 80 CSK IL-8 MMP-120.848 81 CSK Adiponectin MMP-12 0.848 82 Kallikrein 7 IMB1 MMP-12 0.84883 Calpain I CSK MMP-12 0.848 84 Macrophage GAPDH, liver MMP-12 0.848mannose receptor 85 SCF sR CDK5/p35 MMP-12 0.848 86 IGFBP-2 CyclophilinA MMP-12 0.848 87 CDK5/p35 BTK MMP-12 0.848 88 Macrophage CSK MMP-120.847 mannose receptor 89 Cadherin E IGFBP-2 MMP-12 0.847 90Thrombospondin-1 TCTP MMP-12 0.847 91 ERBB1 C9 MMP-12 0.847 92 RAC1RGM-C MMP-12 0.847 93 ERBB1 Cyclophilin A MMP-12 0.847 94 CXCL16,soluble GAPDH, liver MMP-12 0.847 95 RGM-C Cyclophilin A MMP-12 0.847 96LRIG3 Cyclophilin A MMP-12 0.847 97 ERBB1 RAC1 MMP-12 0.847 98Kallikrein 7 UBE2N MMP-12 0.847 99 MMP-7 LRIG3 MMP-12 0.847 100 BMP-1BTK MMP-12 0.847

TABLE 24 100 Panels of four biomarkers including MMP-12 Markers AUC 1MMP-7 C9 GAPDH, liver MMP-12 0.892 2 C9 LRIG3 GAPDH, liver MMP-12 0.8903 MMP-7 LRIG3 GAPDH, liver MMP-12 0.889 4 SCF sR C9 GAPDH, liver MMP-120.889 5 IGFBP-2 C9 GAPDH, liver MMP-12 0.889 6 C9 LGMN GAPDH, liverMMP-12 0.889 7 IGFBP-2 MMP-7 GAPDH, liver MMP-12 0.889 8 MMP-7 BMP-1GAPDH, liver MMP-12 0.889 9 C9 Kallikrein 7 GAPDH, liver MMP-12 0.889 10C9 BMP-1 GAPDH, liver MMP-12 0.888 11 ERBB1 C9 GAPDH, liver MMP-12 0.88712 MMP-7 CDK5/p35 GAPDH, liver MMP-12 0.886 13 C9 b-ECGF GAPDH, liverMMP-12 0.886 14 Cadherin E MMP-7 GAPDH, liver MMP-12 0.885 15 MMP-7TPSB2 GAPDH, liver MMP-12 0.885 16 Macrophage C9 GAPDH, liver MMP-120.885 mannose receptor 17 MMP-7 IGFBP-7 GAPDH, liver MMP-12 0.885 18MMP-7 CSK GAPDH, liver MMP-12 0.884 19 MMP-7 b-ECGF GAPDH, liver MMP-120.884 20 C9 Adiponectin GAPDH, liver MMP-12 0.884 21 C9 TPSB2 GAPDH,liver MMP-12 0.884 22 HMG-1 MMP-7 CSK MMP-12 0.884 23 SCF sR MMP-7GAPDH, liver MMP-12 0.884 24 MMP-7 Thrombospondin-1 GAPDH, liver MMP-120.883 25 CXCL16, soluble C9 GAPDH, liver MMP-12 0.883 26 MMP-7Kallikrein 7 GAPDH, liver MMP-12 0.883 27 C9 MIP-5 GAPDH, liver MMP-120.883 28 SCF sR BMP-1 GAPDH, liver MMP-12 0.883 29 C9 CSK LGMN MMP-120.883 30 C9 GAPDH, liver LDH-H 1 MMP-12 0.882 31 C9 RGM-C GAPDH, liverMMP-12 0.882 32 Macrophage MMP-7 GAPDH, liver MMP-12 0.882 mannosereceptor 33 Endostatin C9 GAPDH, liver MMP-12 0.882 34 MMP-7 RGM-CGAPDH, liver MMP-12 0.882 35 LRIG3 Kallikrein 7 GAPDH, liver MMP-120.882 36 C9 Cadherin-6 GAPDH, liver MMP-12 0.882 37 MMP-7 LRIG3 CSKMMP-12 0.882 38 MMP-7 GAPDH, liver TCTP MMP-12 0.882 39 C9 CSKKallikrein 7 MMP-12 0.881 40 MMP-7 GAPDH, liver LDH-H 1 MMP-12 0.881 41MMP-7 LGMN GAPDH, liver MMP-12 0.881 42 ERBB1 MMP-7 GAPDH, liver MMP-120.881 43 HMG-1 MMP-7 GAPDH, liver MMP-12 0.881 44 IGFBP-2 Kallikrein 7GAPDH, liver MMP-12 0.881 45 C9 Thrombospondin-1 GAPDH, liver MMP-120.881 46 C9 CDK5/p35 GAPDH, liver MMP-12 0.881 47 ERBB1 C9 CSK MMP-120.881 48 MMP-7 BMP-1 CSK MMP-12 0.881 49 Cadherin E MMP-7 CSK MMP-120.881 50 SCF sR CDK5/p35 GAPDH, liver MMP-12 0.881 51 C9 RGM-C CSKMMP-12 0.881 52 C9 GAPDH, liver NACA MMP-12 0.880 53 C9 LRIG3 CSK MMP-120.880 54 MMP-7 Adiponectin GAPDH, liver MMP-12 0.880 55 C9 CSK GAPDH,liver MMP-12 0.880 56 LRIG3 BMP-1 GAPDH, liver MMP-12 0.880 57 MMP-7 PTNGAPDH, liver MMP-12 0.880 58 C9 CSK Thrombospondin-1 MMP-12 0.880 59Activin A C9 GAPDH, liver MMP-12 0.880 60 Endostatin C9 CSK MMP-12 0.88061 IGFBP-2 BMP-1 GAPDH, liver MMP-12 0.880 62 IGFBP-2 LRIG3 GAPDH, liverMMP-12 0.880 63 SCF sR MMP-7 CSK MMP-12 0.880 64 Cadherin E C9 GAPDH,liver MMP-12 0.880 65 SCF sR LRIG3 GAPDH, liver MMP-12 0.880 66 CalpainI C9 GAPDH, liver MMP-12 0.879 67 RAC1 C9 Kallikrein 7 MMP-12 0.879 68C9 LRIG3 TCTP MMP-12 0.879 69 C9 CDK5/p35 CSK MMP-12 0.879 70 C9 CSKLDH-H 1 MMP-12 0.879 71 ERBB1 MMP-7 CSK MMP-12 0.879 72 Activin A MMP-7GAPDH, liver MMP-12 0.879 73 IGFBP-2 Thrombospondin-1 GAPDH, liverMMP-12 0.879 74 C9 Proteinase-3 GAPDH, liver MMP-12 0.878 75 vWF C9GAPDH, liver MMP-12 0.878 76 MMP-7 CNDP1 GAPDH, liver MMP-12 0.878 77 C9BMP-1 CSK MMP-12 0.878 78 C9 CK-MB GAPDH, liver MMP-12 0.878 79 IGFBP-2MMP-7 CSK MMP-12 0.878 80 MMP-7 GAPDH, liver Fibronectin MMP-12 0.878 81MMP-7 CD30 Ligand GAPDH, liver MMP-12 0.878 82 C9 CDK5/p35 TCTP MMP-120.878 83 C9 CNDP1 GAPDH, liver MMP-12 0.878 84 Calpain I C9 CSK MMP-120.878 85 MMP-7 C9 CSK MMP-12 0.877 86 MMP-7 CK-MB GAPDH, liver MMP-120.877 87 Calpain I MMP-7 GAPDH, liver MMP-12 0.877 88 SCF sR C9 CSKMMP-12 0.877 89 MMP-7 Cadherin-6 GAPDH, liver MMP-12 0.877 90 MMP-7Catalase GAPDH, liver MMP-12 0.877 91 MMP-7 CDK5/p35 CSK MMP-12 0.877 92MMP-7 RAC1 GAPDH, liver MMP-12 0.877 93 SCF sR Kallikrein 7 GAPDH, liverMMP-12 0.877 94 C9 Catalase GAPDH, liver MMP-12 0.877 95 C9 FGF-17GAPDH, liver MMP-12 0.877 96 HMG-1 MMP-7 TCTP MMP-12 0.877 97 ERBB1 C9TCTP MMP-12 0.877 98 MMP-7 GAPDH, liver NACA MMP-12 0.877 99 ERBB1 BMP-1GAPDH, liver MMP-12 0.877 100 HSP 90a C9 LRIG3 MMP-12 0.877

TABLE 25 100 Panels of five biomarkers including MMP-12 Markers AUC 1 C9LRIG3 Kallikrein 7 GAPDH, liver MMP-12 0.900 2 MMP-7 C9 LRIG3 GAPDH,liver MMP-12 0.900 3 SCF sR C9 LRIG3 GAPDH, liver MMP-12 0.900 4 SCF sRC9 BMP-1 GAPDH, liver MMP-12 0.898 5 IGFBP-2 MMP-7 LRIG3 GAPDH, liverMMP-12 0.898 6 IGFBP-2 C9 Kallikrein 7 GAPDH, liver MMP-12 0.897 7 MMP-7C9 BMP-1 GAPDH, liver MMP-12 0.897 8 MMP-7 C9 TPSB2 GAPDH, liver MMP-120.897 9 IGFBP-2 MMP-7 Thrombospondin-1 GAPDH, liver MMP-12 0.897 10MMP-7 C9 RGM-C GAPDH, liver MMP-12 0.897 11 HMG-1 MMP-7 C9 GAPDH, liverMMP-12 0.897 12 Macrophage C9 LRIG3 GAPDH, liver MMP-12 0.897 mannosereceptor 13 C9 LRIG3 LGMN GAPDH, liver MMP-12 0.897 14 C9 Kallikrein 7LGMN GAPDH, liver MMP-12 0.897 15 Cadherin E MMP-7 BMP-1 GAPDH, liverMMP-12 0.897 16 Macrophage C9 Kallikrein 7 GAPDH, liver MMP-12 0.897mannose receptor 17 MMP-7 C9 Kallikrein 7 GAPDH, liver MMP-12 0.896 18Cadherin E IGFBP-2 MMP-7 GAPDH, liver MMP-12 0.896 19 MMP-7 C9 b-ECGFGAPDH, liver MMP-12 0.896 20 IGFBP-2 C9 Thrombospondin-1 GAPDH, liverMMP-12 0.896 21 SCF sR C9 CDK5/p35 GAPDH, liver MMP-12 0.896 22 C9 BMP-1LGMN GAPDH, liver MMP-12 0.896 23 MMP-7 LRIG3 BMP-1 GAPDH, liver MMP-120.896 24 MMP-7 C9 CDK5/p35 GAPDH, liver MMP-12 0.896 25 SCF sR MMP-7CDK5/p35 GAPDH, liver MMP-12 0.896 26 SCF sR MMP-7 BMP-1 GAPDH, liverMMP-12 0.896 27 IGFBP-2 C9 LRIG3 GAPDH, liver MMP-12 0.896 28 SCF sRMMP-7 C9 GAPDH, liver MMP-12 0.895 29 Macrophage ERBB1 C9 GAPDH, liverMMP-12 0.895 mannose receptor 30 IGFBP-2 MMP-7 Kallikrein 7 GAPDH, liverMMP-12 0.895 31 IGFBP-2 MMP-7 CDK5/p35 GAPDH, liver MMP-12 0.895 32MMP-7 C9 GAPDH, liver LDH-H 1 MMP-12 0.895 33 IGFBP-2 MMP-7 C9 GAPDH,liver MMP-12 0.895 34 Macrophage MMP-7 C9 GAPDH, liver MMP-12 0.895mannose receptor 35 MMP-7 IGFBP-7 BMP-1 GAPDH, liver MMP-12 0.895 36 C9BMP-1 Kallikrein 7 GAPDH, liver MMP-12 0.895 37 SCF sR MMP-7 LRIG3GAPDH, liver MMP-12 0.895 38 SCF sR IGFBP-2 MMP-7 GAPDH, liver MMP-120.895 39 C9 b-ECGF LGMN GAPDH, liver MMP-12 0.895 40 MMP-7 C9 LGMNGAPDH, liver MMP-12 0.895 41 SCF sR HMG-1 MMP-7 CSK MMP-12 0.895 42IGFBP-2 MMP-7 IGFBP-7 GAPDH, liver MMP-12 0.895 43 Cadherin E MMP-7CDK5/p35 GAPDH, liver MMP-12 0.895 44 MMP-7 C9 Thrombospondin-1 GAPDH,liver MMP-12 0.895 45 Macrophage MMP-7 LRIG3 GAPDH, liver MMP-12 0.895mannose receptor 46 MMP-7 IGFBP-7 LRIG3 GAPDH, liver MMP-12 0.895 47IGFBP-2 MMP-7 BMP-1 GAPDH, liver MMP-12 0.895 48 MMP-7 BMP-1 CSK GAPDH,liver MMP-12 0.895 49 MMP-7 LRIG3 Kallikrein 7 GAPDH, liver MMP-12 0.89550 IGFBP-2 MMP-7 TPSB2 GAPDH, liver MMP-12 0.895 51 MMP-7 BMP-1 GAPDH,liver LDH-H 1 MMP-12 0.894 52 C9 Kallikrein 7 TPSB2 GAPDH, liver MMP-120.894 53 Cadherin E MMP-7 b-ECGF GAPDH, liver MMP-12 0.894 54 SCF sR C9Kallikrein 7 GAPDH, liver MMP-12 0.894 55 IGFBP-2 MMP-7 GAPDH, liverLDH-H 1 MMP-12 0.894 56 SCF sR C9 Thrombospondin-1 GAPDH, liver MMP-120.894 57 MMP-7 BMP-1 Thrombospondin-1 GAPDH, liver MMP-12 0.894 58 MMP-7C9 CSK GAPDH, liver MMP-12 0.894 59 Endostatin C9 LRIG3 GAPDH, liverMMP-12 0.894 60 C9 LRIG3 Thrombospondin-1 GAPDH, liver MMP-12 0.894 61C9 LRIG3 BMP-1 GAPDH, liver MMP-12 0.894 62 IGFBP-2 MMP-7 b-ECGF GAPDH,liver MMP-12 0.894 63 IGFBP-2 MMP-7 LGMN GAPDH, liver MMP-12 0.894 64MMP-7 C9 Cadherin-6 GAPDH, liver MMP-12 0.894 65 Cadherin E MMP-7 C9GAPDH, liver MMP-12 0.894 66 HMG-1 MMP-7 C9 CSK MMP-12 0.894 67 SCF sRC9 Adiponectin GAPDH, liver MMP-12 0.894 68 CXCL16, soluble C9 LRIG3GAPDH, liver MMP-12 0.894 69 IGFBP-2 MMP-7 CSK GAPDH, liver MMP-12 0.89470 SCF sR Macrophage C9 GAPDH, liver MMP-12 0.894 mannose receptor 71MMP-7 C9 Adiponectin GAPDH, liver MMP-12 0.894 72 Cadherin E MMP-7 TPSB2GAPDH, liver MMP-12 0.894 73 C9 LRIG3 Cadherin-6 GAPDH, liver MMP-120.894 74 MMP-7 LRIG3 CSK GAPDH, liver MMP-12 0.894 75 MMP-7 LRIG3Thrombospondin-1 GAPDH, liver MMP-12 0.894 76 ERBB1 MMP-7 C9 GAPDH,liver MMP-12 0.894 77 ERBB1 C9 BMP-1 GAPDH, liver MMP-12 0.894 78 ERBB1C9 LGMN GAPDH, liver MMP-12 0.894 79 C9 Adiponectin LGMN GAPDH, liverMMP-12 0.894 80 MMP-7 BMP-1 CDK5/p35 GAPDH, liver MMP-12 0.893 81 MMP-7BMP-1 LGMN GAPDH, liver MMP-12 0.893 82 C9 LRIG3 GAPDH, liver NACAMMP-12 0.893 83 C9 Kallikrein 7 GAPDH, liver LDH-H 1 MMP-12 0.893 84 C9Kallikrein 7 Adiponectin GAPDH, liver MMP-12 0.893 85 IGFBP-2 MMP-7Cadherin-6 GAPDH, liver MMP-12 0.893 86 IGFBP-2 C9 LGMN GAPDH, liverMMP-12 0.893 87 MMP-7 BMP-1 TPSB2 GAPDH, liver MMP-12 0.893 88 C9 LRIG3Adiponectin GAPDH, liver MMP-12 0.893 89 C9 TPSB2 LGMN GAPDH, liverMMP-12 0.893 90 MMP-7 C9 CD30 Ligand GAPDH, liver MMP-12 0.893 91Cadherin E MMP-7 LRIG3 GAPDH, liver MMP-12 0.893 92 SCF sR IGFBP-2 C9GAPDH, liver MMP-12 0.893 93 HMG-1 IGFBP-2 MMP-7 GAPDH, liver MMP-120.893 94 SCF sR Cadherin E MMP-7 GAPDH, liver MMP-12 0.893 95 SCF sRMMP-7 Thrombospondin-1 GAPDH, liver MMP-12 0.893 96 IGFBP-2 C9 BMP-1GAPDH, liver MMP-12 0.893 97 MMP-7 LRIG3 GAPDH, liver TCTP MMP-12 0.89398 C9 LRIG3 GAPDH, liver LDH-H 1 MMP-12 0.893 99 SCF sR C9 TPSB2 GAPDH,liver MMP-12 0.893 100 IGFBP-2 Macrophage MMP-7 GAPDH, liver MMP-120.893 mannose receptor

What is claimed is:
 1. A method for diagnosing that an individual doesor does not have lung cancer, the method comprising: detecting, in abiological sample from an individual, biomarker values that eachcorrespond to one of at least N biomarkers selected from Table 21,wherein said individual is classified as having or not having lungcancer based on said biomarker values, and wherein N=2-86.
 2. The methodof claim 1, wherein detecting the biomarker values comprises performingan in vitro assay.
 3. The method of claim 2, wherein said in vitro assaycomprises at least one capture reagent corresponding to each of saidbiomarkers, and further comprising selecting said at least one capturereagent from the group consisting of aptamers, antibodies, and a nucleicacid probe.
 4. The method of claim 3, wherein said at least one capturereagent is an aptamer.
 5. The method of claim 2, wherein the in vitroassay is selected from the group consisting of an immunoassay, anaptamer-based assay, a histological or cytological assay, and an mRNAexpression level assay.
 6. The method of claim 1, wherein each biomarkervalue is evaluated based on a predetermined value or a predeterminedrange of values.
 7. The method claim 1, wherein the biological sample islung tissue and wherein the biomarker values derive from a histologicalor cytological analysis of said lung tissue.
 8. The method of claim 1,wherein the biological sample is selected from the group consisting ofwhole blood, plasma, and serum.
 9. The method of claim 1, wherein thebiological sample is serum.
 10. The method of claim 1, wherein theindividual is a human.
 11. The method of claim 1, wherein N=2-15. 12.The method of claim 1, wherein N=2-10.
 13. The method of claim 1,wherein N=3-10.
 14. The method of claim 1, wherein N=4-10.
 15. Themethod of claim 1, wherein N=5-10.
 16. The method of claim 1, whereinthe individual is a smoker.
 17. The method of claim 1, wherein theindividual has a pulmonary nodule.
 18. The method of claim 1, whereinthe lung cancer is non-small cell lung cancer.
 19. Acomputer-implemented method for indicating a likelihood of lung cancer,the method comprising: retrieving on a computer biomarker informationfor an individual, wherein the biomarker information comprises biomarkervalues that each correspond to one of at least N biomarkers selectedfrom Table 21; performing with the computer a classification of each ofsaid biomarker values; and indicating a likelihood that said individualhas lung cancer based upon a plurality of classifications, and whereinN=2-86.
 20. The method of claim 19, wherein indicating the likelihoodthat the individual has lung cancer comprises displaying the likelihoodon a computer display.
 21. A computer program product for indicating alikelihood of lung cancer, the computer program product comprising: acomputer readable medium embodying program code executable by aprocessor of a computing device or system, the program code comprising:code that retrieves data attributed to a biological sample from anindividual, wherein the data comprises biomarker values that eachcorrespond to one of at least N biomarkers selected from Table 21,wherein said biomarkers were detected in the biological sample; and codethat executes a classification method that indicates a lung diseasestatus of the individual as a function of said biomarker values; andwherein N=2-86.
 22. The computer program product of claim 21, whereinsaid classification method uses a probability density function.
 23. Thecomputer program product of claim 22, wherein said classification methoduses two or more classes.
 24. A method for diagnosing that an individualdoes or does not have lung cancer, the method comprising: detecting, ina biological sample from an individual, biomarker values that eachcorrespond to one of at least N biomarkers selected from Table 20,wherein said individual is classified as having or not having lungcancer based on said biomarker values, and wherein N=2-25.
 25. Themethod of claim 24, wherein detecting the biomarker values comprisesperforming an in vitro assay.
 26. The method of claim 25, wherein saidin vitro assay comprises at least one capture reagent corresponding toeach of said biomarkers, and further comprising selecting said at leastone capture reagent from the group consisting of aptamers, antibodies,and a nucleic acid probe.
 27. The method of claim 26, wherein said atleast one capture reagent is an aptamer.
 28. The method of claim 25,wherein the in vitro assay is selected from the group consisting of animmunoassay, an aptamer-based assay, a histological or cytologicalassay, and an mRNA expression level assay.
 29. The method of claim 24,wherein each biomarker value is evaluated based on a predetermined valueor a predetermined range of values.
 30. The method claim 24, wherein thebiological sample is lung tissue and wherein the biomarker values derivefrom a histological or cytological analysis of said lung tissue.
 31. Themethod of claim 24, wherein the biological sample is selected from thegroup consisting of whole blood, plasma, and serum.
 32. The method ofclaim 24, wherein the biological sample is serum.
 33. The method ofclaim 24, wherein the individual is a human.
 34. The method of claim 24,wherein N=2-15.
 35. The method of claim 24, wherein N=2-10.
 36. Themethod of claim 24, wherein N=3-10.
 37. The method of claim 24, whereinN=4-10.
 38. The method of claim 24, wherein N=5-10.
 39. The method ofclaim 24, wherein the individual is a smoker.
 40. The method of claim24, wherein the individual has a pulmonary nodule.
 41. The method ofclaim 24, wherein the lung cancer is non-small cell lung cancer.
 42. Acomputer-implemented method for indicating a likelihood of lung cancer,the method comprising: retrieving on a computer biomarker informationfor an individual, wherein the biomarker information comprises biomarkervalues that each correspond to one of at least N biomarkers selectedfrom Table 20; performing with the computer a classification of each ofsaid biomarker values; and indicating a likelihood that said individualhas lung cancer based upon a plurality of classifications, and whereinN=2-25.
 43. The method of claim 42, wherein indicating the likelihoodthat the individual has lung cancer comprises displaying the likelihoodon a computer display.
 44. A computer program product for indicating alikelihood of lung cancer, the computer program product comprising: acomputer readable medium embodying program code executable by aprocessor of a computing device or system, the program code comprising:code that retrieves data attributed to a biological sample from anindividual, wherein the data comprises biomarker values that eachcorrespond to one of at least N biomarkers selected from Table 20,wherein said biomarkers were detected in the biological sample; and codethat executes a classification method that indicates a lung diseasestatus of the individual as a function of said biomarker values; andwherein N=2-25.
 45. The computer program product of claim 44, whereinsaid classification method uses a probability density function.
 46. Thecomputer program product of claim 44, wherein said classification methoduses two or more classes.
 47. A method for diagnosing that an individualdoes or does not have lung cancer, the method comprising: detecting, ina biological sample from an individual, biomarker values that eachcorrespond to one of at least N biomarkers selected from Table 18,wherein said individual is classified as having or not having lungcancer based on said biomarker values, and wherein N=2-36.
 48. Themethod of claim 47, wherein detecting the biomarker values comprisesperforming an in vitro assay.
 49. The method of claim 48, wherein saidin vitro assay comprises at least one capture reagent corresponding toeach of said biomarkers, and further comprising selecting said at leastone capture reagent from the group consisting of aptamers, antibodies,and a nucleic acid probe.
 50. The method of claim 49, wherein said atleast one capture reagent is an aptamer.
 51. The method of claim 48,wherein the in vitro assay is selected from the group consisting of animmunoassay, an aptamer-based assay, a histological or cytologicalassay, and an mRNA expression level assay.
 52. The method of claim 47,wherein each biomarker value is evaluated based on a predetermined valueor a predetermined range of values.
 53. The method claim 47, wherein thebiological sample is lung tissue and wherein the biomarker values derivefrom a histological or cytological analysis of said lung tissue.
 54. Themethod of claim 47, wherein the biological sample is selected from thegroup consisting of whole blood, plasma, and serum.
 55. The method ofclaim 47, wherein the biological sample is serum.
 56. The method ofclaim 47, wherein the individual is a human.
 57. The method of claim 47,wherein N=2-15.
 58. The method of claim 47, wherein N=2-10.
 59. Themethod of claim 47, wherein N=3-10.
 60. The method of claim 47, whereinN=4-10.
 61. The method of claim 47, wherein N=5-10.
 62. The method ofclaim 47, wherein the individual is a smoker.
 63. The method of claim47, wherein the individual has a pulmonary nodule.
 64. The method ofclaim 47, wherein the lung cancer is non-small cell lung cancer.
 65. Acomputer-implemented method for indicating a likelihood of lung cancer,the method comprising: retrieving on a computer biomarker informationfor an individual, wherein the biomarker information comprises biomarkervalues that each correspond to one of at least N biomarkers selectedfrom Table 18; performing with the computer a classification of each ofsaid biomarker values; and indicating a likelihood that said individualhas lung cancer based upon a plurality of classifications, and whereinN=2-36.
 66. The method of claim 65, wherein indicating the likelihoodthat the individual has lung cancer comprises displaying the likelihoodon a computer display.
 67. A computer program product for indicating alikelihood of lung cancer, the computer program product comprising: acomputer readable medium embodying program code executable by aprocessor of a computing device or system, the program code comprising:code that retrieves data attributed to a biological sample from anindividual, wherein the data comprises biomarker values that eachcorrespond to one of at least N biomarkers selected from Table 18,wherein said biomarkers were detected in the biological sample; and codethat executes a classification method that indicates a lung diseasestatus of the individual as a function of said biomarker values; andwherein N=2-36.
 68. The computer program product of claim 67, whereinsaid classification method uses a probability density function.
 69. Thecomputer program product of claim 68, wherein said classification methoduses two or more classes.
 70. A method for determining information aboutlung cancer in an individual comprising: detecting, in a biologicalsample from an individual, biomarker values that each correspond to oneof at least N biomarkers selected from Table 21, wherein N=2-86; andwherein said biomarker value provides information about the lung cancerin the individual.
 71. The method of claim 70, wherein detecting thebiomarker values comprises performing an in vitro assay.
 72. The methodof claim 71, wherein said in vitro assay comprises at least one capturereagent corresponding to each of said biomarkers, and further comprisingselecting said at least one capture reagent from the group consisting ofaptamers, antibodies, and a nucleic acid probe.
 73. The method of claim72, wherein said at least one capture reagent is an aptamer.
 74. Themethod of claim 71, wherein the in vitro assay is selected from thegroup consisting of an immunoassay, an aptamer-based assay, ahistological or cytological assay, and an mRNA expression level assay.75. The method of claim 70, wherein each biomarker value is evaluatedbased on a predetermined value or a predetermined range of values. 76.The method claim 70, wherein the biological sample is lung tissue andwherein the biomarker values derive from a histological or cytologicalanalysis of said lung tissue.
 77. The method of claim 70, wherein thebiological sample is selected from the group consisting of whole blood,plasma, and serum.
 78. The method of claim 70, wherein the biologicalsample is serum.
 79. The method of claim 70, wherein the individual is ahuman.
 80. The method of claim 70, wherein N=2-15.
 81. The method ofclaim 70, wherein N=2-10.
 82. The method of claim 70, wherein N=3-10.83. The method of claim 70, wherein N=4-10.
 84. The method of claim 70,wherein N=5-10.
 85. The method of claim 70, wherein the individual is asmoker.
 86. The method of claim 70, wherein the individual has apulmonary nodule.
 87. The method of claim 70, wherein the lung cancer isnon-small cell lung cancer.
 88. The method of claim 70, wherein theinformation comprises prognosis, cancer classification, prediction ofdisease risk, or selection of treatment.
 89. A method for determininginformation about lung cancer in an individual comprising: detecting, ina biological sample from an individual, biomarker values that eachcorrespond to one of at least N biomarkers selected from Table 18,wherein N=2-36; and wherein said biomarker value provides informationabout the lung cancer in the individual.
 90. The method of claim 89,wherein detecting the biomarker values comprises performing an in vitroassay.
 91. The method of claim 90, wherein said in vitro assay comprisesat least one capture reagent corresponding to each of said biomarkers,and further comprising selecting said at least one capture reagent fromthe group consisting of aptamers, antibodies, and a nucleic acid probe.92. The method of claim 91, wherein said at least one capture reagent isan aptamer.
 93. The method of claim 90, wherein the in vitro assay isselected from the group consisting of an immunoassay, an aptamer-basedassay, a histological or cytological assay, and an mRNA expression levelassay.
 94. The method of claim 89, wherein each biomarker value isevaluated based on a predetermined value or a predetermined range ofvalues.
 95. The method claim 89, wherein the biological sample is lungtissue and wherein the biomarker values derive from a histological orcytological analysis of said lung tissue.
 96. The method of claim 89,wherein the biological sample is selected from the group consisting ofwhole blood, plasma, and serum.
 97. The method of claim 89, wherein thebiological sample is serum.
 98. The method of claim 89, wherein theindividual is a human.
 99. The method of claim 89, wherein N=2-15. 100.The method of claim 89, wherein N=2-10.
 101. The method of claim 89,wherein N=3-10.
 102. The method of claim 89, wherein N=4-10.
 103. Themethod of claim 89, wherein N=5-10.
 104. The method of claim 89, whereinthe individual is a smoker.
 105. The method of claim 89, wherein theindividual has a pulmonary nodule.
 106. The method of claim 89, whereinthe lung cancer is non-small cell lung cancer.
 107. The method of claim89, wherein the information comprises prognosis, cancer classification,prediction of disease risk, or selection of treatment.
 108. A method fordetermining information about lung cancer in an individual comprising:detecting, in a biological sample from an individual, biomarker valuesthat each correspond to one of at least N biomarkers selected from Table20, wherein N=2-25; and wherein said biomarker value providesinformation about the lung cancer in the individual.
 109. The method ofclaim 108, wherein detecting the biomarker values comprises performingan in vitro assay.
 110. The method of claim 109, wherein said in vitroassay comprises at least one capture reagent corresponding to each ofsaid biomarkers, and further comprising selecting said at least onecapture reagent from the group consisting of aptamers, antibodies, and anucleic acid probe.
 111. The method of claim 110, wherein said at leastone capture reagent is an aptamer.
 112. The method of claim 109, whereinthe in vitro assay is selected from the group consisting of animmunoassay, an aptamer-based assay, a histological or cytologicalassay, and an mRNA expression level assay.
 113. The method of claim 108,wherein each biomarker value is evaluated based on a predetermined valueor a predetermined range of values.
 114. The method claim 108, whereinthe biological sample is lung tissue and wherein the biomarker valuesderive from a histological or cytological analysis of said lung tissue.115. The method of claim 108, wherein the biological sample is selectedfrom the group consisting of whole blood, plasma, and serum.
 116. Themethod of claim 108, wherein the biological sample is serum.
 117. Themethod of claim 108, wherein the individual is a human.
 118. The methodof claim 108, wherein N=2-15.
 119. The method of claim 108, whereinN=2-10.
 120. The method of claim 108, wherein N=3-10.
 121. The method ofclaim 108, wherein N=4-10.
 122. The method of claim 108, wherein N=5-10.123. The method of claim 108, wherein the individual is a smoker. 124.The method of claim 108, wherein the individual has a pulmonary nodule.125. The method of claim 108, wherein the lung cancer is non-small celllung cancer.
 126. The method of claim 108, wherein the informationcomprises prognosis, cancer classification, prediction of disease risk,or selection of treatment.
 127. A method for diagnosing that anindividual does or does not have lung cancer, the method comprising:detecting, in a biological sample from an individual, biomarker valuesthat each correspond to one of at least N biomarkers selected from Table21, wherein said individual is classified as having or not having lungcancer based on said biomarker values, wherein N=2-86, and wherein atleast one of said biomarkers is selected from Table
 20. 128. The methodof claim 127, wherein at least one of said biomarkers selected fromTable 20 is MMP-12.
 129. The method of claim 127, wherein detecting thebiomarker values comprises performing an in vitro assay.
 130. The methodof claim 129, wherein said in vitro assay comprises at least one capturereagent corresponding to each of said biomarkers, and further comprisingselecting said at least one capture reagent from the group consisting ofaptamers, antibodies, and a nucleic acid probe.
 131. The method of claim130, wherein said at least one capture reagent is an aptamer.
 132. Themethod of claim 129, wherein the in vitro assay is selected from thegroup consisting of an immunoassay, an aptamer-based assay, ahistological or cytological assay, and an mRNA expression level assay.133. The method of claim 127, wherein each biomarker value is evaluatedbased on a predetermined value or a predetermined range of values. 134.The method claim 127, wherein the biological sample is lung tissue andwherein the biomarker values derive from a histological or cytologicalanalysis of said lung tissue.
 135. The method of claim 127, wherein thebiological sample is selected from the group consisting of whole blood,plasma, and serum.
 136. The method of claim 127, wherein the biologicalsample is serum.
 137. The method of claim 127, wherein the individual isa human.
 138. The method of claim 127, wherein N=2-15.
 139. The methodof claim 127, wherein N=2-10.
 140. The method of claim 127, whereinN=3-10.
 141. The method of claim 127, wherein N=4-10.
 142. The method ofclaim 127, wherein N=5-10.
 143. The method of claim 127, wherein theindividual is a smoker.
 144. The method of claim 127, wherein theindividual has a pulmonary nodule.
 145. The method of claim 127, whereinthe lung cancer is non-small cell lung cancer.
 146. Acomputer-implemented method for indicating a likelihood of lung cancer,the method comprising: retrieving on a computer biomarker informationfor an individual, wherein the biomarker information comprises biomarkervalues that each correspond to one of at least N biomarkers selectedfrom Table 21; wherein at least one of said N biomarkers is selectedfrom Table 20 performing with the computer a classification of each ofsaid biomarker values; and indicating a likelihood that said individualhas lung cancer based upon a plurality of classifications, and whereinN=2-86.
 147. The method of claim 146 wherein at least one of saidbiomarker(s) selected from Table 20 is MMP-12.
 148. The method of claim146, wherein indicating the likelihood that the individual has lungcancer comprises displaying the likelihood on a computer display.
 149. Acomputer program product for indicating a likelihood of lung cancer, thecomputer program product comprising: a computer readable mediumembodying program code executable by a processor of a computing deviceor system, the program code comprising: code that retrieves dataattributed to a biological sample from an individual, wherein the datacomprises biomarker values that each correspond to one of at least Nbiomarkers selected from Table 21, wherein at least one of said Nbiomarkers is selected from Table 10, and wherein said biomarkers weredetected in the biological sample; and code that executes aclassification method that indicates a lung disease status of theindividual as a function of said biomarker values; and wherein N=2-86.150. The computer program product of claim 149, wherein at least one ofsaid biomarker(s) selected from Table 20 is MMP-12.
 151. The computerprogram product of claim 149, wherein said classification method uses aprobability density function.
 152. The computer program product of claim149, wherein said classification method uses two or more classes.